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Abstract 

In this work, we study trade-offs between accuracy and privacy in the context of linear queries over 
histograms. This is a rich c lass of queries that includes contingency tables and ran ge queries, and has been a 
focus of a long line of work [BLR08||RR10||DRV10HHT10||HR1o| LHR+ 10||BDKTi2| . For a given set of d linear 
queries over a database x £ R , we seek to find the differential ly p rivate mechanism that has the minimum 



mean squared error. For pure differential privacy, HT10 BDKT12 give an 0(log d) approximation to the 



optimal mechanism. Our first contribution is to give an (^(log^d) approximation guarantee for the case of 
(e, <5)-differential privacy. Our mechanism is simple, efficient and adds carefully chosen correlated Gaussian 
noise to the answers. We prove its approximation guarantee relative to the hereditary discrepancy lower bound 



of MN12 , using tools from convex geometry. 

We next consider this question in the case when the number of queries exceeds the number of individuals 
in the database, i.e. when d > n = \\x\\i. The lower bounds used in the previous approximation algorithm no 



longer apply, and in fact better mechanisms are known in this setting [BLR08||RR10l|HR10||GHRUll||GRU12 



Our second main contribution is to give an (e, <5)-differentially private mechanism that for a given query set A 
and an upper bound n on ||x||i, has mean squared error within polylog(d, N) of the optimal for A and n. This 
approximation is achieved by coupling the Gaussian noise addition approach with linear regression over the 
l\ ball. Additionally, we show a similar polylogarithmic approximation guarantee for the best e-differentially 
private mechanism in this sparse setting. Our work also shows that for arbitrary counting queries, i.e. A with 
entries in {0, 1}, there is an e-diff erentially private mechanism with expected error O (^Jn) per query, improving 



on the 0(n3) bound of BLR08 , and matching the lower bound implied by DN03 up to logarithmic factors. 

The connection between hereditary discrepancy and the privacy mechanism enables us to derive the first 
polylogarithmic approximation to the hereditary discrepancy of a matrix A. 

1 Introduction 



Differential privacy DMNS06 is a recent privacy definition that has quickly become the standard notion of privacy 
in statistical databases. Informally, a mechanism (a randomized function on databases) satisfies differential privacy 
if the distribution of the outcome of the mechanism does not change noticeably when one individual's input to the 
database is changed. Privacy is measured by how small this change must be: an e-differentially private (e-DP) 
mechanism Ai satisfies Pi[Ai(x) £ S] < exp(e)Pr[A^(a;') £ S) for any pair x,x' of neighboring databases, and 
for any measurable subset S of the range. A relaxation of this definition is approximate differential privacy. A 
mechanism M is (e, <5)-differentially private ((e,6)-DP) if Pr[M(x) G S] < exp(e)Pr[M(x') € S] + 5 with x,x',S 
as before. Here S is thought of as negligible in the size of the database. Both these definitions satisfy several 
desirable properties such as composability, and are resistant to post-processing of the output of the mechanism. 

In recent years, a large body of research has shown that this strong privacy definition still allows for very accurate 
analyses of statistical databases. At the same time, answering a large number of adversarially chosen queries 
accurately is inherently impossible with any semblance of privacy. Indeed Dinur and Nissim |DN03 show that 



answering 0(d) random subset sums (O hides polylogarithmic factors in N,d,l/6.) of a set of d bits with 
(per query) error o(vrf) allows an attacker to reconstruct (an arbitrarily good approximation to) all the private 
information. Thus there is an inherent trade-off between privacy and accuracy when answering a large number of 
queries. In this work, we study this trade-off in the context of counting queries, and more generally linear queries. 

We think of the database as being given by a multiset of database rows, one for each individual. We will let N 
denote the size of the universe that these rows come from, and we will denote by n the number of individuals in 
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the database. We can represent the database as its histogram x £ with Xi denoting the number of occurrences 
of the ith element of the universe. Thus x would in fact be a vector of non-negative integers with ||x||i = n. We 
will be concerned with reporting reasonably accurate answers to a given set of d linear queries over this histogram 
x. This set of queries can naturally be represented by a matrix A £ R dxAf with the vector Ax £ R d giving the 
correct answers to the queries. When A £ {0, l} dxN , we call such queries counting queries. We are interested in 
the (practical) regime where N ^> d ^> n, although our results hold for all settings of the parameters. 

A differentially private mechanism will return a noisy answer to the query A and, in this work, we measure the 
performance of the mechanisms in terms of its worst case total expected squared error. Suppose that X C is 
the set of all possible databases. The error of a mechanism A4 is defined as eTTj^(A,X) = max xe x E[||A4(x) — 
Here the expectation is taken over the internal coin tosses of the mechanism itself, and we look at the 
worst case of this expected squared error over all the databases in X. Unless stated otherwise, the error of 
the mechanism will refer to this worst case expected i\ error. Phrased thus, the Gaussian noise mechanism of 
Dwork et al. 
the databases 



DKM + 06a gives error at most 0(d 2 ) for any counting query and guarantees (e, S)-DW\ over all 
i.e. X = M, N . Moreover, the aforementioned lower bounds imply that there exist counting queries 
for which this bound can not be improved. For e-DP, Hardt and Talwar HT10 gave a mechanism with error 
0(d 2 log tt) and showed that this is the best possible for random counting queries. Thus the worst case accuracy 
for counting queries is fairly well-understood in this measure. 

Specific sets of counting queries of interest can however admit much better mechanisms than adversarially chosen 
queries for which the lower bounds are shown. Indeed several classes of specific queries have attracted attention. 
Some, such as range queries, are "easier" , and asymptotically better mechanisms can be designed for them. 
Others, such as constant dimensional contingency tables, are nearly as hard as general counting queries, and 
asymptotically better mechanisms can be ruled out in some ranges of the parameters. These query-specific upper 
bounds are usually proved by carefully exploiting the structure of the query, and query-specific lower bounds 
have been proved by reconstruction attacks that exploit a lower bound on the smallest singular value of an 
appropriately chosen A |DN03||DMT07[|DY08[|KRSU10||Del2j|KRS13]. It is natural to address this question in 



a competitive analysis framework: can we design an efficient algorithm that given any query A, computes (even 
approximately) the minimum error differentially private mechanism for A? 



Hardt and Talwar HT10 answered this question in the affirmative for e-DP mechanisms, and gave a mechanism 
that has error within factor 0(log 3 d) of the optimal assuming a conjecture from convex geometry known as 



the hyperplane conjecture or the slicing conjecture. Bhaskara et al. [BDKT12 removed the dependence on the 
hyperplane conjecture and improved the approximation ratio to 0(log d). Can relaxing the privacy requirement 
to (e, <5)-DP help with accuracy? In many settings, (e, <5)-DP mechanisms can be simpler and more accurate than 
the best known e-DP mechanisms. This motivates the first question we address. 

Question 1 Given A, can we efficiently approximate the optimal error (e,S)-DP mechanism for it? 



Hardt and Talwar |HT10 showed that for some A, the lower bound for e-DP mechanism can be Sl(logTV) larger 
than known (e, <5)-DP mechanisms. For non-linear Lipschitz queries, De Del2 showed that this gap can be as 
large as Q(\/d) (even when N — d). This leads us to ask: 

Question 2 How large can the gap between the optimal e-DP mechanism and the optimal (e,5)-DP mechanism 
be for linear queries ? 

When the dat abases are sparse, e.g. when ||x||i < n <C d, one may obtain better mechanisms. Blum, Ligett and 
gave an e-DP mechanism that can answer any set of d counting queries with error O(dnz). A series 



BLR08 



Roth 

of subsequent works |DNR+09[[DlWl^|RR10[|HR10[pHRlJll[|HLM12[|GRU12| led to (e, <5)-DP mechanisms that 
have error only O(dn). Thus when n < d, the lower bound of 0(d 2 ) for arbitrary databases can be breached by 
exploiting the sparsity of the database. This motivates a more refined measure of error that takes the sparsity 
of A into account. Given an A and n, one can ask for the mechanism Ai that minimizes the sparse case error 
max 3 ..|| :I .|| 1 < n E[||jW(x) — -4a;|||]. The next set of questions we study address this measure. 



Question 3 Given A and n, can we approximate the optimal sparse case error (e, 8) -DP mechanism for A when 
restricted to databases of size at most n ? 



Question 4 Given A and n, can we approximate the optimal sparse case error e-DP mechanism for A when 
restricted to databases of size at most n ? 



1 Here and in the rest of the introduction, we suppress the dependence of the error on e and S. 
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The gap between the 0(dnt) error e-DP mechanism of BLR08 and the 0(dn) error (e, <5)-DP mechanism 



of HR10 leads us to ask: 

Question 5 Is there an e-DP mechanism with error O(dn) for databases of size at most n? 
1.1 Results 



In this work, we answer Questions l|5 above. Denote by Bf = {x <E M, N : ||x||i < 1} the TV-dimensional i\ ball. 



Recall that for any query matrix A 6 R <ixA ' and any set X C R , the (worst-case expected squared) error of Ai 
is defined as 

err M (A,X) ^maxE[||M(x) - Ax\\l\. 

In this paper, we are interested in both the case when X = R w , called the dense case, and when X = nB^ for 
n < d, called the sparse case. We also write evr M {A) = err^ (^4, R w ) and en M (A,n) = err^ (A, nB^ ). 

Our first result is a simple and efficient mechanism that for query matrix A gives an 0(log 2 d) approximation to 
the optimal error. 

Theorem 1 Given a query matrix A G R dxAr , there is an efficient (e,S)-DP mechanism Ai and an efficiently 
computable lower bound La such that 

• errM(^4) < 0(log 2 d log 1/5) ■ La, and 

• for any (e 1 5)-DP mechanism M.' , err^'iA, d) > La- 

We also show that the gap of Q(\og(N / d)) between e-DP and (e, <5)-DP mechanisms shown in |HT10| is essentially 
the worst possible, within polylog(d) factor, for linear queries. More precisely, the lower bound on e-DP mecha- 



nisms used in HT10 is always within 0(log(A^/d) polylog(d)) of the lower bound La computed by our algorithm 
above. Let A4* denote the e-DP generalized if- norm mechanism in |HT10| . 

Theorem 2 For any (e,5)-DP mechanism M, crix^) = fi(l/(log° (1) (d) \og{N/d))) en». (A). 

We next move to the sparse case. Here we give results analogous to the dense case with a slightly worse approxi- 
mation ratio. 

Theorem 3 Given A £ ~§L d ' KN and a bound n, there is an efficient (e,5)-DP mechanism Ai and an efficiently 
computable lower bound La,u such that 

• en M (A, n) < 0(log 3/2 d ■ ^logAHog 1/6 + log 2 d log 1/5) ■ L A , n , and 

• For any (e,5)-DP mechanism M! , &cvw(A,n) > La^ u - 

Theorem 4 Given A £ R dxN and a bound n, there is an efficient e-DP mechanism Ai and an efficiently 
computable lower bound La^ such that 

• en>,(A, n) < 0(log° (1) d ■ log 3/2 N) ■ L A , n , and 

• For any e-DP mechanism Ai' , err m> (A, n) > La, u - 

We remark that in these theorems, our upper bounds hold for all x with < n, whereas the lower bounds 

hold even when x is an integer vector. 

The (e, <5)-DP mechanism of Theorem [3] when run on any counting query has error no larger than the best known 
bounds |GRU12 for counting queries, up to constants (not ignoring logarithmic factors). The e-DP mechanism 



of Theorem |4j when run on any counting query can be shown to have nearly the same asymptotics, answering 
question [5] in the affirmative. 

Theorem 5 For any counting query A, there is an e-DP mechanism Ai such that err_M(A, n) — O(dn). 
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We will summarize some key ideas we use to achieve these results. More details will follow in Section 1.2 



For the upper bounds, the first crucial step is to decompose A into "geometrically nice" components and then add 
Gaussian noise to each component. This is similar to the approach in HT10,BDKT12 but we use the minimum 



volume enclosing ellipsoid, rather than the M-ellipsoid used in those works, to facilitate the decomposition process. 
This allows us to handle the approximate and the sparse cases. In addition, it simplifies the mechanism as well 
as the analysis. For the sparse case, we further couple the mechanism with least squares estimation of the noisy 
answer with respect to nABf . By utilizing techniques from statistical estimation, we can show that this process 
can reduce the error when n < d, and prove an error upper bound dependent on the size of the smallest projection 
ofnABf. 

For the lower bounds, we first lower bound the accuracy of (e, <$)-DP mechanism by the hereditary discrepancy 
of the query matrix A, which we in turn lower bound in terms of the least singular values of submatrices of A. 



Finally, we close the loop by utilizing the restricted invertibility principle by Bourgain and Tzafriri BT87 and 



its extension by Vershynin VerOl which, informally, shows that if there does not exist a "small" projection of 
nABi then A has a "large" submatrix with a "large" least singular value. 

Approximating Hereditary Discrepancy 

The discrepancy of a matrix A g R dxJV is defined to be disc(A) = min xe /_i !+ i\iv U^AaiHc*,. The hereditary 
discrepancy of a matrix is defined as herdisc(A) = max,gc[jv] disc(A|s), where A\s denotes the matrix A restricted 
to the columns indexed by S. 

As hereditary discrepancy is a maximum over exponentially many submatrices, it is not a priori clear if there 
even exists a polynomial-time verifiable certificate for low hereditary discrepancy. Additionally, we can show that 



it is NP-hard to approximate hereditary discrepancy to within a factor of 3/2. Bansal BanlO gave a pseudo- 
approximation algorithm for hereditary discrepancy, which efficiently computes a coloring of discrepancy at most 
a factor of 0(\ogdN) larger than herdisc(A) for a d x N matrix A. His algorithm allows efficiently computing a 
lower bound on herdisc for any restriction A\g; however, such a lower bound may be arbitrarily loose, and before 
our work it was not known how to efficiently compute nearly matching lower and upper bounds on herdisc. 

Muthukrishnan and Nikolov MN12 show that for a query matrix A G Il' : ' xAr , the error of any (e, J)-DP mechanism 



is lower bounded by (an £ 2 version of) (herdisc (A)) 2 (up to logarithmic factors). Moreover, the lower bound used 
in Theorem[T]is in fact a lower bound on this version of herdisc(A). Using the von Neumann minimax theorem, we 
can go between the £| an d the versions of these concepts, allowing us to sandwich the hereditary discrepancy 
of A between two quantities: a determinant based lower bound and the efficiently computable expected error of 
the private mechanism. As the two quantities are nearly matching, our work therefore leads to a polylogarithmic 
approximation to the hereditary discrepancy of any matrix A. 

1.2 Techniques 

In addition to known techniques from the differential privacy literature, our work borrows tools from discrepancy 
theory, convex geometry and statistical estimation. We next briefly describe how they fit in. 

Central to designing a provably good approximation algorithm is an efficiently computable lower bound on the 



optimum. Muthukrishnan and Nikolov MN12| proved that (a slight variant of) the hereditary discrepancy of A 



leads to a lower bound for the error of any (e,<5)-DP mechanism. Lovasz, Spencer and Vesztergombi |LSV86 



showed that hereditary discrepancy itself can be lower bounded by a quantity called the determinant lower bound. 
Geometrically, this lower bound corresponds to picking the d columns of A that (along with the origin) give us 
a simplex with the largest possible volume. The volume or this simplex, appropriately normalized, gives us a 
lower bound on OPT. More precisely for any simplex S, d 3 ■ vol(5)a log 2 d gives a lower bound on the error. 
The log 2 d factor can be removed by using a lower bound based on the least singular values of submatrices of 
A. Geometrically, for the least singular value lower bound we need to find a simplex of large volume whose d 
non-zero vertices are also nearly pairwise orthogonal. 

If the N columns of A all lie in a unit ball of radius R, it can be shown that adding Gaussian noise proportional 
to R suffices to guarantee (e, £)-DP, resulting in a mechanism having total squared error dR 2 . Can we relate 
this quantity to the lower bound? It turns out that if the unit ball of radius R is the minimum volume ellipsoid 
containing the columns of A, this can be done. In this case, a result of Vershynin |Ver01| , building on the restricted 



invertability results by Bourgain and Tzafriri |BT87 , tells us that one can find f2(d) vertices of K that touch 
the minimum containing ellipsoid, and are nearly orthogonal. The simplex formed by these vertices therefore 
has large volume, giving us a (e, <5)-DP lower bound of VL{dR 2 ). In this case, the Gaussian mechanism with the 
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optimal R is within a constant factor of the lower bound. When the minimum volume enclosing ellipsoid is not 
a ball, we need to project the query along the i shortest axes of this ellipsoid, answer this projection using the 
Gaussian mechanism, and recurse on the orthogonal projection. Using the full power of the restricted invertability 
result by Vershynin allows us to construct a large simplex and prove our competitive ratio. 



Hardt and Talwar HT10 also used a volume based lower bound, but for e-DP mechanisms, one can take K, the 



symmetric convex hull of all the columns of A and use its volume instead of the volume of S in the lower bound 
above. How do these lower bounds compare? By a result of Barany and Fiiredi |BF88| and Gluskin Glu07 



one 

can show that the volume of the convex hull of N points can be bounded by (log N) d / 2 d~ d / 2 times that of the 
minimum enclosing ellipsoid. This, along with the aforementioned restricted invertability results, allows us to 
prove that the e-DP lower bound is within 0((log N) polylogd) of the (e, <5)-DP lower bound. 

How do we handle sparse queries? The first observation is that the lower bounding technique gives us d columns 
of A and the resulting lower bound holds not just for A but even for the d x d submatrix of A corresponding to 
the maximum volume simplex S; moreover, the lower bound holds even when all databases are restricted to 0(d) 
individuals. Thus the lower bound holds when n — 0(d) and this value marks the transition between the sparse 
and the dense cases. Moreover, when the minimum volume ellipsoid containing the columns of A is a ball, the 
restricted invertibility principle of Bourgain and Tzafriri and Vershynin gives us a <i-dimensional simplex with 
nearly pairwise orthogonal vertices, and, therefore any n-dimensional face of this simplex is another simplex of 
large volume. The large rt-dimensional simplex gives a lower bound on error when databases are restricted to 
have at most n individuals. 

For smaller n, the error added by the Gaussian mechanism may be too large, and even though the value Ax 
lies in nAB^ , the noisy answer will likely fall outside this set. A common technique in statistical estimation 
for handling such error is to "project" the noisy point back into nAB^ , i.e. report the point y in nAB^ that 
minimizes the Euclidean distance to the noisy answer y. This projection step provably reduces the expected error! 
Geometrically, we use well known techniques from statistics to show that the error after projection is bounded 
by the "shadow" that nAB^ leaves on the noise vector; this shadow is much smaller than the length of the noise 
vector when n — o(d). In fact, when the noise is a spherical Gaussian, it can be shown that \\y — y\\\ is only about 
^\\y — This gives near optimal bounds for the case when the minimum volume ellipsoid is a ball; the general 
case is handled using a recursive mechanism as before. 



To get an e-DP mechanism, we use the if-norm mechanism HT10 instead of Gaussian noise. To bound the 
shadow of nAB^ on w, where w is the noise vector generated by the -ftf-norm mechanism, we first analyze 
the expectation of (a*, if) for any column of A, and we use the log concavity of the noise distribution to prove 
concentration of this random variable. A union bound helps complete the argument as in the Gaussian case. 



1.3 Related Work 



Dwork et al. DMNS06 showed that any query can be released while adding noise proportional to the total 



sensitivity of the query. This motivated the question of designing mechanisms with good guarantees for any set 
of low sensitivity queries. Nissim, Raskhodnikova and Smith [NRS07] showed that adding noise proportional 
to (a smoothed version of) the local sensitivity of the query suffices for guaranteeing differential privacy; this 
may be much smaller than the worst case sensitivity for non-linear queries. Lower bounds on the amount of 
noise needed for general low sensitivity queries have been shown in [DN03 DMT07 DY08 DMNS06 , RHS07 , 



HT10 Del2 



KRSU10 showed upper and lower bounds for contingency table queries 



Kasiviswathan et al. 

and more recently |KRS13| showed lower bounds on publishing error rates of classifiers or even M-estimators. 
Muthukrishnan and Nikolov |MN12| showed that combinatorial discrepancy lower bounds the noise for answering 
any set of linear queries. 



Using learning theoretic techniques, Blum, Ligett and Roth BLR08 first showed that one can exploit sparsity of 



the database, and answer a large number of counting queries with error small compared to the number of individu- 
als in the database. This line of work has been further extended and improved in terms of error bounds, efficiency, 



generality and interactivity in several subsequent works DNR+09 DRV10 RR10 HR10 GHRU11 



HLM12 



Ghosh, Roughgarden and Sundarajan GRS09 showed that for any one dimensional counting query, a discrete 



version of the Laplacian mechanism is optimal for pure privacy in a very general utilitarian framework and Gupte 
and Sundararajan GS10 extended this to risk averse agents. Brenner and Nissim |BN10| showed that such 



universally optimal private mechanisms do not exist for two counting queries or for a single non-binary sum 
query. As mentioned above, Hardt and Talwar HT10 , and Bhaskara et al. BDKT12 gave relative guarantees 
for multi-dimensional queries under pure privacy with respect to total squared error. De 



Del2 unified and 



■5 



strengthened these bounds and showed stronger lower bounds for the class of nondinear low sensitivity queries. 



For specific queries of interest, improved upper bounds are known. Barak et al. 



BCD + 07| studied low dimensional 
one can reduce 



marginals and showed that by running the Laplace mechanism on a different set of queries, 
error. Using a similar strategy, improved mechanisms were given by XWG10 CSS10 for orthogonal counting 



queries, and near optimal mechanisms were given by Muthukrishnan and Nikolov MN12 for halfspace counting 
queries. The approach of answering a set of quer ies different from the target query set has also been st udied in 
more generality and for other sets of queries by |LHR+10l|DWHLlll|RHS07[|XWG10[|XXY10l|YZW + 12|. Li and 



Miklau LM12a,LM12b study a class of mechanisms called extended matrix mechanisms and show that one can 
efficiently find the best mechanisms from this class. Hay et al. HRMS10 show that in certain settings such as 
unattributed histograms, correcting noisy answers to enforce a consistency constraint can improve accuracy. 

Very recently, Fawaz et al. FMN used the hereditary discrepancy lower bounds of Muthukrishnan and Nikolov, 



as well as the determinant lower bound on discrepancy of Lovasz, Spencer, and Vesztergombi, to prove that a 
certain Gaussian noise mechanism is nearly optimal (in the dense setting) for computing any given convolution 
map. Like our algorithms, their algorithm adds correlated Gaussian noise; however, they always use the Fourier 
basis to correlate the noise. 



We refer the reader to texts by Chazelle [ChaOO] and Matousek Mat99 and the chapter by Beck and Sos BS95 



for an introduction to discrepancy theory. Bansal BanlO showed that a semidefinite relaxation can be used to 



design a pseudo-approximation algorithm for hereditary discrepancy. Matousek Mat 11 showed that the deter- 



minant based lower bound of Lovasz, Spencer and Vesztergombi LSV86 is tight up to polylogarithmic factors 



Larsen |Larll showed applications of hereditary discrepancy to data structure lower bounds, and Chandrasekaran 



and Vempala [CV11 recently showed applications of hereditary discrepancy to problems in integer programming. 



Roadmap. In Section |2.3.3| we introduce relevant preliminaries. In Section [3] we present our main results 
for approximate differential privacy, and in Section [4] we present our main results for pure differential privacy. 
In Section [5] we prove absolute upper bounds on the error required for privately answering sets of d counting 
queries. In Section [6] we give some extensions and applications of our main results, namely an optimal efficient 
mechanism for error in the dense case, and the efficient approximation to hereditary discrepancy implied by 
that mechanism. We conclude in Section [3 



2 Preliminaries 



We start by introducing some basic notation. 

Let Bf, and B d be, respectively, the l\ and £2 unit balls in R d . Also, let sym{ai, . . . ciat} be the convex hull 
of the vectors ±ai, . . . , ±a/v- Equivalcntly, sym{ai, . . . , a at} = AB^ where A is a matrix whose columns equal 
ai, . . . , ajy- 

For a d x N matrix A and a set S C [N], we denote by A\s the submatrix of A consisting of those columns of A 
indexed by elements of S. Occasionally we refer to a matrix V whose columns form an orthonormal basis for some 
subspace of interest V as the orthonormal basis of V. Vk is the set of orthogonal projections onto fc-dimensional 
subspaces of H d . 

By <7 m in(^4) and cr max (^4) we denote, respectively, the smallest and largest singular value of A. I.e., a m i n (A) — 
min^. : 1 1 ^ 1 1 2 = 1 II^^IU and cr max (A) — max a ..|| a; || 2= i ||>la;||2- In general, <7i(A) is the i-th largest singular value of A, 
and Xi(A) is the i-th largest eigenvalue of A. We recall the minimax characterization of eigenvalues for symmetric 
matrices: 

A = max min x T Ax. 

V:dimV=i xeV:\\x\\ 2 = l 

For a matrix A (and the corresponding linear operator), we denote by ||vl||2 = cr max (A) the spectral norm of A 
and \\A\\p = °f (A) = \Jjli,j a i,j tne Frobenius norm of A. By ker A we denote the kernel of A, i.e. the 

subspace of vectors x for which Ax = 0. 

2.1 Geometry 

For a set K C R d , we denote by vold(K) its d-dimensional volume. Often we use instead the volume radius 

vrad d {K) 4 (vol(if)/vol(^)) 1 / d . 
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Subscripts are omitted when this does not cause confusion. When K lies in a fc-dimensional affinc subspace of 
R d , vol(K) and vrad(if) (without subscripts) are understood to imply vol*; and vradfc, respectively. 

For a convex body K C R d , the polar body K° is defined by K° = {y : (y,x) < 1 Vi € K}. The fundamental 
fact about polar bodies we use is that for any two convex bodies K and L 

KCLoL°CK°. (1) 

In the remainder of this paper, when we claim that a fact follows "by convex duality," we mean that it is implied 
by@. 

A convex body K is (centrally) symmetric if —K = K. The Minkowski norm \\x\\k induced by a symmetric 
convex body K is defined as ||x||a' = min{r £ R : x € rK}. The Minkowski norm induced by the polar body 
K° of K is the dual norm of \\x\\k and also has the form ||y||if° = max^gx (x, y). For convex symmetric K, the 
induced norm and dual norm satisfy Holder's inequality: 

K*,v>|<IMklMk->. (2) 



An ellipsoid in R d is the image of B d under an affine map. All ellipsoids we consider are symmetric, and therefore, 
are equal to an image FB d of the ball B d under a linear map F. A full dimensional ellipsoid E = FB d can be 
equivalently defined as E — {x : x T (FF T )^ 1 x < 1}. The polar body of a symmetric ellipsoid E — FB d is the 
ellipsoid (or cylinder with an ellipsoid as its base in case F is not full dimensional) E° — {x : x T FF T x < 1}. 

We repeatedly use a classical theorem of Fritz John, characterizing the (unique) minimum volume enclosing 
ellipsoid (MEE) of any convex body K. We note that John's theorem is frequently stated in terms of the 
maximum volume enclosed ellipsoid in K; the two variants of the theorem are equivalent by convex duality. The 
MEE of K is also known as a the Lowner or Lowner-John ellipsoid of K. 

Theorem 6 ( | Joh48|) Any convex body K C R d is contained in a unique ellipsoid of minimal volume. This 
ellipsoid is B d if and only if there exist unit vectors u\ , . . . , u m € K fl B d an d positive reals c\ , . . . , c m such that 

2J ckUi = 

ciUiuj = 1 



According to John's characterization, when the MEE of K is the ball B^ , the contact points of K and B^ satisfy 
a structural property — the identity decomposes into a linear combination of the projection matrices onto the 
lines of the contact points. Intuitively, this means that K "hits" B c \ in all directions — it has to, or otherwise 
£?2 can be "pinched" in order to produce a smaller ellipsoid that still contains K . This intuition is formalized by 
a theorem of Vershynin, which generalizes the work of Bourgain and Tzafriri on restricted invertibihty [BT87] . 



Vershynin ( |Ver01 Theorem 3.1) shows that there exist Q(d) contact points of K and B^ which are approximately 



pairwise orthogonal. 

Theorem 7 ( |Ver01|) Let K C M d be a symmetric convex body whose minimum volume enclosing ellipsoid is 
the unit ball B^. Let T be a linear map with spectral norm \\T\\2 < 1. Then for any f3, there exist constant Ci(f3), 
C^ifi) and contact points x±, . . . , Xk with k > (1 — /3)||T|||n such that the matrix TX = (Txi)^ =1 satisfies 



2.2 Statistical Estimation 

A key element in our algorithms for the sparse case is the use of least squares estimation to reduce error. Below we 
present a bound on the error of least squares estimation with respect to symmetric convex bodies. This analysis 



appears to be standard in the statistics literature; a special case of it appears for example in RWY11 



Lemma 1 Let L C R d be a symmetric convex body, and let y € L and y = y + w for some w € R d . Let, finally, 
y = arguing \\y - We have \\y - y||| < min{4||u>|||, 4|H| L ° }. 
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Figure 1 : A schematic illustration of the proof of Lemma[TJ The vector p — y is proportional in length to (y — y,w) 
and the vector y — p is proportional in length to (y — y,y — y). As ||n;|| > \\y — \\y\\ , \\p — y\\ > \\y — p\\. 

Proof: First we show the easier bound \\y — j/ 1 1 2 < 2||w;||2, which follows by the triangle inequality: 

\\y - 2/11 2 < \\y~yh + \\y - vh < m-vh- 

The second bound is based on Holder's inequality and the following simple but very useful fact, illustrated 
schematically in Figure [l] 

\\y - y\\i = (y - y>v - y) + (y - v>y - v) 

<2(y-y,y-y). (3) 

The inequality ^ follows from 

{y - y,y - y) = \\y - y\\l + {y-y,v -v) > Wv-vWl + {y-y,y-y) = {y-v,v-y)- 

Inequality w = y — y, and Holder's inequality imply 

\\y-y\\l<2(y-y,w) < 2\\y - y\\ L \\w\\ L ° <A\\w\\ L o, 
which completes the proof. I 
2.3 Differential Privacy 

Following recent work in differential privacy, we model private data as a database D of n rows, where each row 
of D contains information about an individual. Formally, a database D is a multiset of size n of elements of 
the universe U = {t\,.. . ,£jv} of possible user types. Our algorithms take as input a histogram x £ R N of the 
database £), where the i-th component Xi of x encodes the number of individuals in D of type ti. Notice that in 
this histogram representation, we have ||a;||i = n when D is a database of size n. Also, two neighboring databases 
D and D' that differ in the presence or absence of a single individual correspond to two histograms x and x' 
satisfying ||a; — x'\\i = 1. 

Through most of this paper, we work under the notion of approximate differential privacy. The definition follows. 
Definition 1 ( DMNSOG^DKM+OGbj) A (randomized) algorithm M. with input domainR N and output range 



Y is (e, 5) -differentially private if for every n, every x,x' with \\x — x'\\± < 1, and every measurable S C Y, M. 
satisfies 

Pr[M(x) £S}< e £ Pr[7W(a;') e5]+i. 
When 5 = 0, we are in the regime of pure differential privacy. 

An important basic property of differential privacy is that the privacy guarantees degrade smoothly under com- 
position and are not affected by post-processing. 



Lemma 2 ( DMNS06 > DKM + 06b| ) Let Mi and M2 satisfy (e±, Si)- and (e 2, 82)- differential privacy, respec- 



tively. Then the algorithm which on input x outputs the tuple (Mi(x), M2(Mi(x),x)) satisfies (ei +£2,<5i +S2)- 
differential privacy. 
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2.3.1 Optimality for Linear Queries 



In this paper we study the necessary and sufficient error incurred by differentially private algorithms for approxi- 
mating linear queries. A set of d linear queries is given by a d x N query matrix or workload A] the exact answers 
to the queries on a histogram x are given by the d-dimensional vector y = Ax. 

We define error as total squared error. More precisely, for an algorithm M. and a subset X C Mr, we define 

eir M (A,X) = supE\\Ax - M(A,x)\\j. 

x£X 



We also write err^n (A, n-Bf) as errjvi(A, n). The optimal error achievable by any (e, (5)-differentially private 
algorithm for queries A and databases of size up to n is 

opt £ 5 (A, n) = inf en» (A, n) , 

M 

where the infimum is taken over all (e, (5)-differentially private algorithms. When no restrictions are placed on 
the size n of the database, the appropriate notion of optimal error is opt e g(A) = sup„ opt e g{A, n). Similarly, for 
an algorithm A4, the error when database size is not bounded is ervj^{A) = sup„ err^v^A, n). A priori it is not 
clear that these quantities are necessarily finite, but we will show that this is the case. 

In order to get tight dependence on the privacy parameter e in our analyses, we will use the following relationship 
between opt e S (A, n) and opt e / g,(A, n). 

Lemma 3 For any e, any 8 < 1, any integer k and for 5' > e ee Z\ $i 

opt e S (A,n) > k 2 opt k£ S ,(A,n/k). 



Proof: Let M. be an (e, ^-differentially private algorithm achieving opt e S (A, n). We will use M. as a black box 
to construct a (ke, <5')-differentially private algorithm M' which satisfies the error guarantee err^vc (A, n/k) < 
^err M (A,n). 

The algorithm M! on input x satisfying ||x||i < n/k outputs ^/A(kx). We need to show that M.' satisfies 
(ks, <5')-differential privacy. Let x and x' be two neighboring inputs to M' , i.e. ||a; — ir'||i < 1, and let S be a 
measurable subset of the output M.' . Denote p\ = Pr[M'(x) € S] and P2 = Pr[M'(x') G S]. We need to show 
that pi < e ke p2 + S' . To that end, define xo — kx, x\ — kx + (x' — x), X2 = kx + 2(x' — x), . . ., Xk = kx' . 
Applying the (e, 5)-privacy guarantee of M. to each of the pairs of neighboring inputs x , x\, xi,x 2 , ■ ■ ., x k -i, x k 
in sequence gives us 

Pi < e ke p 2 + (l + e/ + ... + e^-^S - e ke p 2 + ^— -6. 

e £ — 1 

This finishes the proof of privacy for M.' . It is straightforward to verify that err»/ (A, n/k) < err_A4(^4, n). ■ 

Above, we state the error and optimal error definitions for histograms x, which can be arbitrary real vectors. All 
our algorithms work in this general setting. Recall, however, that the histograms arising from our definition of 
databases are integer vectors. Our lower bounds do hold against integer histograms as well. Therefore, defining 
err and opt in terms of integer histograms (i.e. taking err_\4(A,n) = eivj^(A,nB^ n N )) does not change the 
asymptotics of our theorems. 

2.3.2 Gaussian Noise Mechanism 

A basic mechanism for achieving (e, 6) -differential privacy for linear queries is adding appropriately scaled inde- 
pendent Gaussian noise to each query. This approach goes back to the work of Blum et al. [BDMN05] , predating 
the definition of differential privacy. Next we define this basic mechanism formally and give a privacy guaran- 
tee. The privacy analysis of the Gaussian mechanism in the context of (e, <5)-differential privacy was first given 
in 



DKM + 06b . We give the full proof here for completeness. 



Lemma 4 Let A = [ai)^Li be a d x N matrix such that Vi : 1 1 ci^ 1 1 2 < a . Then a mechanism which on input 
x e M, N outputs Ax + w, where w ~ A^(0, a 1+ V 2 ^ x ( 1 / s ) ^ satisfies (e, S)- differential privacy. 
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Proof: Let C = 1+ V 2 MV j) anc j ^ ^ ^ e ^ e probability density function of N(0, Ca) d . Let also K = ABi, so 
|| ac - x'|| x G Si implies £ .A(x - x') € If C B|. Define 

D v (w) = In 



We will prove that when w ~ iV(0, Ccr), for all v E K, Pt[\D v (w)\ > e] < 5. This suffices to prove (e, 6) -differential 
privacy. Indeed, let the algorithm output Ax + w and fix any x' s.t. ||x — x'||i < 1. Let v = A(x — x') G K and 
S = {w : \D v (w)\ > e}. For any measurable T C R d we have 

PrL4x + w G T] = Pr[w G T - Ax] 

+ / p{w) dw 

Sn(T-Ax) J§n(T-Ax) 



< 6 + e e / p(w) dw 

Jsn(T-Ax') 

= 6 + e e Pr[w G T — Ax'] =5 + e e PrL4x' + w G T\. 

We fix an arbitrary v € K and proceed to prove |D„(i/;)| < e with probability at least 1 — 6. We will first compute 
ED v (w) and then apply a tail bound. Recall that p(w) oc exp(— 2C 2 CT 2 IMID- Notice also that, since v G K can 
be written as X)^=i a « a « where ^2 \ a i\ — 1j we have ||u||2 < cr. Then we can write 

„ n , > llu + iull?- Ilwllo 



2C 2 ct 2 
2CV 2 " 2C 2 

Note that to bound |-D„(u>)| we simply need to bound c l a i v T w from above and below. Since 
N(0, jJyfO) we can a PPly a Chernoff bound and we get 



Pr 



\v T w\>±V2HUS) 



< 6. 



Therefore, with probability 1 — 5, 



1/2C- V21n(l/g) l/2C+y/21n(l/3) 
^ < A,(«0 < . 

Substituting C > 1+ V 2 ^ n ( 1 l^]_ completes the proof. ■ 
The following corollary is a useful geometric generalization of Lemma [4] 

Corollary 8 Let A — (ai)fL 1 be a d x N matrix of rank d and let K = sym{ai, . . . , a^}. Let E — FB% (F is a 

linear map) be an ellipsoid containing K. Then a mechanism that outputs Ax + Fw where w ~ N(0, 1+ V 2 ^ n (V ^) 
satisfies (e, 6) -differential privacy. 

Proof: Since K is full dimensional (by rankA = d) and E contains K, E is full dimensional as well, and, therefore, 
F is an invertible linear map. Define G = F _1 A. For each column gi of G, we have |<?i||2 < 1- Therefore, by 
Lemma [4j a mechanism that outputs Gx + w (where w is distributed as in the statement of the corollary) satisfies 
(e, ^-differential privacy. Therefore, FGx + Fw = Ax + Fw is (e, ^-differentially private by the post-processing 
property of differential privacy. I 

We present a composition theorem, specific to composing Gaussian noise mechanisms. We note that a similar 
composition result in a much more general setting but with slightly inferior dependence on the parameters is 
proven in |DRV10| . 

Corollary 9 Let Vi, . . . , Vjt be vector spaces of respective dimensions di, . . . , dk, such that Vi < k— 1, Vi+i C V^~ 
and d\ + . . . + dk = d . Let A = (ai)^L 1 be a d x N matrix of rank d and let K = sym(ai, . . . , ajv)- Let Hi be the 
projection matrix for Vi and let Ei = FiB d ' C Vi be an ellipsoid such that HiK C Then the mechanism that 

outputs Ax + Vk^2i =1 FiWi where for each i, Wi ~ N(0, 1+ v /2 ^ n ( 1 /^) ^di^ satisfies (s, 6) -differential privacy. 
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Proof: Let c(e, 6) = 1+ V 21 ^\ __\ _ Since the random variables Fiw%, . . . , FkWk are pairwise independent Gaussian 
random variables, and FiWi has covariance matrix c(e,5) 2 FiFf , we have that w = V^-_52» =1 FiWi is a Gaussian 
random variable with covariance c(e,8) 2 G, whee G = k^2, i=l FiF^ . By Corollary it is sufficient to show 
that the ellipsoid E = GB^ contains K. By convex duality this is equivalent to showing E° C K° , which is 
in turn equivalent to Vx : < Recalling that ||af|||;a = x T GG T x and || ai|j jc° = max y6 x (y, x) = 

max^j (a,j,x), we need to establish 

Vx e R d ,Vj e [N] : ( aj , x) 2 < x T GG T x. (4) 

We proceed by establishing Q. Since for all i, HiK C E^, by duality and the same reasoning as above, we have 
that for all i and j, (Hia,j,x) < x T FiFf x. Therefore, by the Cauchy-Schwarz inequality, 




k 

< k x T FiF T x = x T GG 



x. 



This completes the proof. ■ 
2.3.3 Noise Lower Bounds 

We will make extensive use of a lower bound on the noise complexity of (e, (5)-differentially private mechanisms 
in terms of combinatorial discrepancy. First we need to define the notion of hereditary a-discrepancy: 

herdisc Q (A n) = max min IIWIsOxllo. 

SC[7V]:|S|<n a;e {-l i 0,+l} s 
N|i>a|S| 

We denote herdisc(A) = max„ herdisci(A, n). An equivalent notation is herdisc^ 2 (A). When the £2 norm is 
substituted with 1^ , we have the classical notion hereditary discrepancy, here denoted herdisc £ °° (A) . 

Next we present the lower bound, which is a simple extension of the discrepancy lower bound on noise recently 



proved by Muthukrishnan and Nikolov MN12 



Theorem 10 ( |MN12|) Let A be an d x N real matrix. For any constant a and sufficiently small constant 
e < e(a) and 6 < 6(a), 

opt e S (A, n) = fi(l) herdisc Q (A, n) 2 . 

We further develop two lower bounds for herdisc a (^4, n) which are more convenient to work with. The first lower 
bound is by using spectral techniques. Observe first that, since the £ 2 -norm of any vector does not increase under 
projection, we have herdisc Q (A, n) > herdisc Q (IIA, n) for any projection matrix LI. Furthermore, recall that for a 
matrix M, cr min (M) = min :E .|| x || 2=1 |jMx||2- For any x £ {—1,0, l} 5 satisfying > a\S\, we have > a\S\. 
Therefore, 

herdisc Q (A n) 2 > max a\S\a 2 : n (A\ s ). (5) 

SC[AT]:|S|<n 

Let's define 

specLBM, n) = max max ka 2 :„ (IL4I g). 

SC[N] Xl£V k mmV 
\S\=k<n 

Substituting ^ into Theorem 10 we have that there exist constants c\ and c 2 such that 

opt Ci C2 (A,ra) = ft(l) ■ specLB(A,?i). (6) 

For the remainder of this paper we fix some constants C\ and c 2 for which ^ holds. Similarly to the notation 
for opt, we will also sometimes denote specLB(A) = max„ specLB(^4, n). We will use primarily the spectral lower 
bound ^ for arguing the optimality of our algorithms. 
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To show the small gap between the approximate and pure privacy (Theorem [2]), we next develop a determinant 
based lower bound. We first switch from herdisc a to the classical notion of hereditary discrepancy, equivalent to 



herdisci, by observing the following relation between herdisCo, and herdisci from MN12 



log 71 

herdisci (An) < — herdisc Q M, n) (7) 

We then use an extension of the classical determinant lower bound for hereditary discrepancy, due to Lovasz, 
Spencer, and Vesztergombi. 



Theorem 11 ( [LSV8 6]) For any real d x N matrix A, 



(herdisci (A, n)) 2 > fi(l) ■ detLB(A,n) = max max fc • | det(IL4| s 

k<n Tl&Vk 

SC[N]:\S\=k 



|2/fc 



Proof: The proof proceeds in two steps: showing that a quantity known as linear discrepancy is at most a 
constant factor larger than hereditary discrepancy; lower bounding linear discrepancy in terms of detLB. Most of 



the proof can be adapted with little modification from proofs of the lower bound on discrepancy in LSV86 



Here we follow the exposition in ChaOO 



Let us first define linear discrepancy. For a d x k matrix M and c € [—1, l] fc , let disc c (M) be defined as 

disc c (Af) = min \\Mx-Mc\\ 2 
xe{-i,i} k 

The linear discrepancy of Af is then defined as lindisc(Af) = max ce [_ 11 ]fe disc c (M). We claim that for any Af, 

lindisc(M) < 2 herdisc(Af). (8) 



The bound (|8|) is proven for the more common variants of lindisc and herdisc in ChaOO , but the proof can be 
seen to apply without modification to discrepancy defined with respect to any norm. For completeness, we give 
the full argument here. For c e { — 1,0, l} fc , we have disc c (M) < herdisc(Af). Call a vector c e [—1, l] k q-integral 
if any coordinate cj of c can be written as Y^=o ^j^^ wnere (&j')^=o G {0j 1} 9+1 - In other words, c is g-integral 
if the binary expansion of any coordinate of c is identically zero after the q-th digit. The bound ^ holds for 
0-integral c, and we prove that it holds for g-integral c by induction on q. For the induction step, assume that the 
bound holds for any (q — l)-integral d and let c be g-integral. Define s £ {—1, l} n by setting Si = — 1 if Cj > 
and Si = 1 otherwise. Then c' = 2c + s € [—1, l] fc is (q — l)-integral, and, by the induction hypothesis, there 
exists xi £ {-M} fe such that ||Mxi - Mc'\\ 2 < 2herdisc(M). Let c" = € {-1,0, l} fe . Dividing by 2 and 
rearranging, we have |]Mc" — Mc||2 < herdisc(Af). The vector c" is 0-integral, and therefore there exists some 
x 2 £ {-l,l} fc such that \\Mx 2 ~ Mc"\\ 2 < hcrdisc(AI). By the triangle inequality, \\Mx 2 - Mc\\ 2 < 2 herdisc(Af), 
and this completes the inductive step. 

We complete the proof of the theorem by proving a lower bound on lindisc in terms of detLB. We note that a 
similar lower bound can be proved for any variant of lindisc defined in terms of any norm. The exact lower bound 
will depend on the volume radius of the unit ball of the norm. Since the proof of ^ also works for any norm, we 
get a determinant lower bound for hereditary discrepancy defined in terms of any norm as well. 

We show that for any d x k matrix M 

lindisc(Af) = 0(1) max Vk\ detILM| 1/fe . (9) 

Letting k range over [n], M range over all d x k submatrices of A, and applying the bounds (|8| and ([9| implies 
the theorem. 

We proceed to prove ([9]). Note that if rank(Af ) < fe, (|9j) is trivially true; therefore, we may assume that rank(A/) = 
k. Note also that without loss of generality we can take II to be the orthogonal projection onto the range of M , 
since this is the projection operator that maximizes | detITA/|. Let E be the ellipsoid E — {x : \\MxW2 < 1}- The 
inequality lindisc (Af) < D is equivalent to 

[-1,1]* C |J D-E + x. (10) 

xe{-i,i} k 
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Thus 2 fc = vol([— 1, l] d ) < 2 k vo\(D ■ E), and therefore D k > vo ^ E ^ ■ On the other hand, the volume of E is equal 
to 

vn](F] = vol (£ 2 ) = vgKgg) 
1 ' |det(M T M)|V2 |det(nM)|' 

Applying the standard estimate volf!?*) 1 '* = ©(A; -1 / 2 ) completes the proof. ■ 

By the determinant lower bound, and Q, we get our determinant lower bound on the noise necessary for privacy. 
For some constant c\,C2 > 0, 

opt CuC2 (A,n) = fi (—^-) detLB(A,n). (11) 
\log nj 



Finally, we recall the stronger volume lower bound against (e, 0) -differential privacy from |HT10,BDKT12 . This 
lower bound is nearly optimal for (e, 0)-differential privacy, but does not hold for (e, ^-differential privacy when 
5 is 2- o( - d \ 



Theorem 12 ( |HT10[|BDKT12| ) For any d x N real matrix A= (a l )fL 1 , 

d, _ A , a d k 2 



opt E (A, -) > volLB(A, e) = max max — vmd(UK) 2 , (12) 



£ 

where K — sym{ Oi}f =1 . 

Furthermore, there exists an efficient mechanism AAk (the generalized if-norm mechanism ) which is (e, 0)- 
differentially private and satisfies err^ K (A) = 0(log 3 d) volLB(^4, e). 

3 Algorithms for Approximate Privacy 

In this section we present our main results: efficient nearly optimal algorithms for approximate privacy in the 
cases of dense databases (n > d/e) and sparse databases (n = o(d/e)). Both algorithms rely on recursively 
computing an orthonormal basis for R d , based on the minimum volume enclosing ellipsoid of the columns of the 
query matrix A. We first present the algorithm for computing this basis, together with a property essential for 
the analyses of the two algorithms presented next. 

3.1 The Base Decomposition Algorithm 

We first present an algorithm (Algorithm [TJ that, given a matrix A € E, dxAr , computes a set of orthonormal 
matrices U\ , . . . , U~k , where k < \1+ log d~\ . For each Uf Uj — , and the union of the columns of U\ , . . . , Uk 

forms an orthonormal basis for ft d . Thus, Algorithm n\ computes a basis for R d , and partitions ("decomposes") 
it into k = O(logd) bases of mutually orthogonal subspaces. This set of bases also induces a decomposition of A 
into A = A\ + . . . + Ak, where A4 = UiUf A. The base decomposition of Algorithm [l] is essential to both our dense 
case and sparse case algorithms. Intuitively, for both cases we can show that the error of a simple mechanism 
applied to A4 can be matched by an error lower bound for Aj+i + . . . Ak- The error lower bounds are based on 
the spectral lower bound specLB on discrepancy; the geometric properties of the minimum enclosing ellipsoid of 
a convex body together with the restricted invertibility principle of Bourgain and Tzafriri are key in deriving the 
lower bounds. 

The next lemma captures the technical property of the decomposition of Algorithm [l] that allows us to prove 
matching upper and lower error bounds for our dense and sparse case algorithms. 

Lemma 5 Let di be the dimension of the span of Ui . Furthermore, for i < k, let Wi — X2j>j Uj , and let Wk = Uk ■ 
For every i < k, there exists a set Si C [N], such that \Si\ = Sl(di) and a^^WiWl A\$ i ) = n(l)max|L 1 HE^g^H 2 ,- 

Proof: Let us, for case of notation, assume that d is a power of 2. We prove that there exists a set S, \S\ = Q(d), 
such that 

al i ^{VV T A\ s ) = ^1) maxima,. || 2 . (13) 



This proves the lemma for i — 1 (substitute Wi = V and di — d/2). Applying ( 13 ) inductively to V A establishes 



the lemma for all i < k. In case i = k, observe that dk = 1 and we only need to show that the matrix U^Ul A 
has a column with £2 norm at least maxjgjv ll^J^'lb- This is trivially true since each column of UkU^A has the 
same £2 norm as the corresponding column of Ul A. 
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Algorithm 1 Base Decomposition 



Input A = ( ai )ili G R dxN {vankA = d); 

Compute E = FB d , the minimum volume enclosing ellipsoid of K = AB\ ; 

Let {ui)f =1 be the (left) singular vectors of F corresponding to singular values <T\ > . . . > a d ; 

if d = 1 then 

Output U\ =u\. 
else 

Let Ui = (u i ) i>d/ 2 and V = (u i ) i < d/2 ; 

Recursively compute a base decomposition V%, . . . , Vk of V T A (k < \1 + \ogd] is the depth of the recursion); 

For each % > 1, let U = VV t \ 
Output {Ux,...,U k }. 
end if 



By applying an appropriate unitary transformation to the columns of A, we may assume that the major axes of 
E are co-linear with the standard basis vectors of III d , and, therefore, F is a diagonal matrix with Fa = <7j. This 
transformation comes without loss of generality, since it applies a unitary transformation to the columns of A 
and V and does not affect the singular values of any matrix T/V T A|s for any S C [N]. For the rest of the proof 
we assume that F is diagonal. 

Since F is diagonal, u, is equal to ej, the i-th standard basis vector. Therefore U% is diagonal and equal to the 
projection onto e^/a+i, • • • , e<j, and V is also diagonal and equal to the projection onto ei, . . . , e&i%. Consider 
L = F^ 1 K = F~ 1 ABf (recall that we assumed that rankA = d and therefore F is non-singular). Since the 
minimum enclosing ellipsoid of K is E = FB%, we have that the minimum enclosing ellipsoid of L is B%. Let 
T = VV T be the projection onto e\, . . . , ed/2- Then, by Theoremm and because \\T\\p = d/2, we have that there 
exists a set S of size \S\ = £l(d) such that a^ ain (TF~ l A\s) = 0,(1). We chose F, and therefore F~ l , as well as T 
to be diagonal matrices, so they all commute. Then, since T is a projection matrix, 

a 2 min (VV T A\ s ) = a 2 min (TA\ s ) = a 2 llin (FTF~ 1 A\s) = a^ lin (FTTF~ 1 A\ S ) 

> <jl, a {FT)v^(TF- l A\ s ) = n(a 2 d/2 ). (14) 



Observe that, since K C E, we have 



UlK C UlE = U?FB d 2 C a d/2+1 Bi 



Therefore, max^ =1 HJ/fffl^Hl < a d/2+i — a d/2- Substituting for a^, 2 into (14) completes the proof. 



3.2 The Dense Case: Correlated Gaussian Noise 



Our first result is an efficient algorithm whose expected error matches the spectral lower bound specLB up to 
polylogarithmic factors and is therefore nearly optimal. This proves Theorem [l] The algorithm adds correlated 
unbiased Gaussian noise to the exact answer Ax. The noise distribution is computed based on the decomposition 
algorithm of the previous subsection. 



Algorithm 2 Gaussian Noise Mechanism 

Input (Public): query matrix A = (a*)^ G JR dxAr (rankvl = d): 
Input (Private): database x G M, N 

Let Ui, . . . , Uk be base decomposition computed by Algorithm [T] on input A, where U is an orthonormal basis 

for a space of dimension di ; 

LetcM ) = i±VpM. 

For each i, let n = max^j || f/^aj- H2; 
For each i, sample Wi ~ A^(0,c(e,(5)) di ; 
Output Ax + Vk^2 i=1 TiUiWi. 
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Theorem 13 Let M g (A,x) be the output of Algorithm^ on input a dx N query matrix A and private input 
x. Ai g (A, x) is (e, 8) -differentially private and for all small enough e and all 8 small enough with respect to e 
satisfies 

eiv Mg {A) = 0(\ag 2 d\ogl/8) opt £>5 (A^) = 0(log 2 dlogl/5)opt £!5 (^). 

Notice that even though we assume rankA = d, this is without loss of generality: if rankA = r < d, we can 
compute an orthonormal basis V for the range of A and apply the algorithm to A' = V T A to compute an 
approximation z to A'x. We have that VA'x — VV T Ax = Ax since VV T is a projection onto the range of A and 
Ax belongs to that range. Then y = Vz gives us an approximation of Ax satisfying \\y — Ax\\2 = \\z — A'x\\2- 

We start the proof of Theorem [13] with the privacy analysis. For ease of notation, we assume throughout the 
analysis that d is a power of 2. 

Lemma 6 A4 g (A,x) satisfies (e, 8) -differential privacy. 

Proof: The lemma follows from Corollary [9] Next we describe in detail why the corollary applies. 

Let Ui, . . . ,Uk be the base decomposition computed by Algorithm [T] on input A. Let V* be the subspace spanned 
by the columns of Ui and let di be the dimension of Vj. The projection matrix onto Vi is UiUj '. Let Ei be 
the ellipsoid U l (r i B^) = F,B^ (F. l is r l U l ). By the definition of n, Uj K C r t B$% and therefore, IL t K C JSf. 
M. g (A,x) is distributed identically to Ax + Vk^2^ =1 F^Wi. Therefore, by Corollary|9j A4 g (A,x) satisfies (e,S)- 
differential privacy. I 

Lemma 7 For all small enough e and all 8 small enough with respect to e, for all i, 

EUnt/^lll^oaog^opt^^l), 

where r<, Ui and uit are as defined in Algorithm^ 
Proof: 

To upper bound E||r.;[/iWi||2, observe that, since the columns of Ui are di pairwise orthogonal unit vectors, 
|| i"i£/iM>i || a = rf H^illl- Therefore, it follows that 

^\\nU lWl \\l = d lC {e,8fr1 = 0(log(l/S)\dr?). (15) 



By (15), it is enough to lower bound opt e S (A) by fi^drf). As a first step we lower bound specLB(A). Then 
the lower bound on opt £ s will follow from (|6| and Lemma [3j 

To lower bound specLB(A), we invoke Lemma [5] It follows from the lemma that for every i there exists a 
projection matrix IL, = WiWj and a set S\ such that a'^ nin (J\iA\s i ) — fi(rf), and, furthermore, |Si| = Q,(di). 
Substituting into the the definition of specLB(^4, di), we have that for all i. 

specLB(A,d l ) > l^la.iijn^ls,) = Afar?). 



Therefore, by (|6j), opt Ci C2 (A, di) = f2(dj?" 2 ) for all i. Finally, by Lemma|3j there exists a small enough 8 = 8(e), 
for which opt e S (A, ^) = ^(jzdirf), and this completes the proof. ■ 

Proof of Theorem |13[ The proof of the theorem follows from Lemma [6] and Lemma [7] The privacy guarantee 
is direct from Lemma [6] Next we prove that the error of M. g is near optimal. 

Let w be the noise vector generated by Algorithm^so that w — Vk^2]^ sd riUiWi. We have err» B (A) = 1E| | it; 1 1| . 
We proceed to upper bound this quantity in terms of opt e g(A). 

By LemmaP7[ for each Wi, E 1 1 7-^ C/i tUi |[§ = 0(log 1/5) opt e(5 (A). Since UfUj = for all i ^ j, and k = O(logd), 
we can bound E|| it; ||| as 

l+logd k . , 

E||ti>||| = k EWnUiWig = 0(logdlogl/8)^2opt SjS (A, -j) = 0(log 2 dlog 1/8) opt £i4 (A, -) 

i=l i=l 

This completes the proof. ■ 
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3.3 The Sparse Case: Least Squares Estimation 

In this subsection we present an algorithm with stronger accuracy guarantees than Algorithm [2j it is optimal for 
any query matrix A and any database size bound n (Theorem [3]) . The algorithm combines the noise distribution 
of Algorithm [2] with a least squares estimation step. Privacy is guaranteed by noise addition, while the least 
squares estimation step reduces the error significantly when n = o(d/e). The algorithm is shown as Algorithms] 



Algorithm 3 Least Squares Mechanism 

Input (Public): query matrix A = (a,i)fL 1 £ R dxAr (rankvl = d); database size bound n 
Input (Private): database x £ R w 

Let U\, . . ., Uk be base decomposition computed by Algorithm [T] on input A, where Ui is an orthonormal basis 

for a space of dimension di ; 

Let t be the largest integer such that d t > en; 

Let X = £* =1 Ui and Y = £*=t+i 17<; 

Call Algorithm [2] to compute y = M g (A, x); 

Let m = XX T y and y 2 = YY T y; 

Let jji = argminlH^! — yi\\\ : y\ £ nXX T K}, where K = ABi. 
Output yi+fj2- 



Theorem 14 Let A4f(A, x, n) be the output of Algorithm^ on input a dx N query matrix A, database size bound 
n and private input x. Aig{A 1 x) is (e, 8) -differentially private and for all small enough e and all 8 small enough 
with respect to e satisfies 

evr Me (A) = 0(log 3/2 <VlogiVlog(l/a) + log 2 dlog(l/J)) opt E)5 (A n). 
Lemma 8 A4g(A, x,n) satisfies (e, 8) -differential privacy. 

Proof: The output of A4g(A, x, n) is a deterministic function of the output of M g (A, x). By Lemma|6j Ai g (A, x) 
satisfies (e, <5)-differential privacy, and, therefore, by Lemma [2] (i.e. the post-processing property of differential 
privacy), A4g(A, x, n) satisfies (e, <5)-differential privacy. ■ 

Lemma 9 Let Ui and t be as defined in Algorithm [3| and let and Wi be as defined in Algorithm [J| Then 
Emax^i nKa^riUiWijl < O^log N log(l/<5)) opt EjS (A, n) . 

Proof: 

First we upper bound Emax^_ 1 n\(aj, rjZ7jiUj)|. After rearranging the terms, we have (a^, rjJJjtOj) = (U^aj,riWi) 
for any i and j. By the definition of r.;, \\U^ &j||2 < fj. Therefore, (JJj aj , rjtUj) is a Gaussian random variable 
with mean and standard deviation at most rfc(s,8). Using standard bounds on the expected maximum of N 
Gaussians (e.g. using a Chernoff bound and a union bound), we have that 

Emaxn| (a,-, riUiwf) \ = nEmax UUfaj, riwf) \ 
3=1 j'=i 

= 0{^\ogNc{e,5)nrj) 

= 0(VlogiVlog(l/(J)-nr?). 

£ 

To finish the proof of the lemma, we need to lower bound opt £ S (A, n) by f2(~nrf ). We will use Lemma[5]to lower 
bound specLB(A, en) by Q,(enrf) and then we will invoke Lemma[3]to get the right dependence on e. 

By LemmaJH] for every i there exists a projection matrix n, = WiWf and a set S% such that o'% Di:s $\.iA\s i ) — &{rf), 
and, furthermore, \Sf\ = f2(dj). By the definition of t, for all i < t, di > en, and, therefore, \Si\ = f2(en). Take 
Ti C S 1 , to be an arbitrary subset of Si of size f2(en). The smallest singular value of ILA|s. is a lower bound on 
the smallest singular value of 11^1^: 

o- min (n.iA\ Ti ) = min ||(n 4 A| Ti )a;||2 = min || (n^| Si )x|| 2 

aj:|jar|j3=l x:||a;||2=l 
supp(a;)CTi 

> min \\(H i A\ Si )x\\ 2 = o- min (Il i A\ Si ), 

x:\\x\\ 2 = l 
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where supp(x) is the subset of coordinates on which x is nonzero. Therefore, (7^(11^1^) = fi(rf )■ Substituting 
into the definition of specLB(A, en), we have 

specLB(A,en) = 0(1^1^(1^^]^)) = n(enr, a ). 

Therefore, by (|6j), opt Ci C2 (A, en) = f2(ew?) for all i < t. Finally, by Lemma [3j there exists a small enough 
8 = 8(e), for which opt e s (A,n) = fl(™rf), and this completes the proof. ■ 

Proof of Theorem |14[ The privacy guarantee is direct from Lemma [8] Next we prove that the error of Aig is 
near optimal. 

We will bound err^ (A, n) by bounding E||yi + j/2 — -<4^||! f° r an y x - Let us fix x and define y = Ax; furthermore, 
define yi = XX T y and y% = YY T y. By the Pythagorean theorem, we can write 

Mm + V2 - v\\i = mi - vi\\l + Eiiwa - teiii. (i6) 



We show that the first term on the right hand side of ( 16 1 is at most 0(log 3 ^ 2 d^/log N \og{\/8)) opt e S (A, n) and 
the second term is 0(log 2 dlog(l/<5)) opt e S (A, n) . 
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The bound on E||te ~ tell 2 follows from Theorem 13 More precisely, Y y~2 is distributed identically to the output 



of M. a (Y T A, x), and by Theorem 

E||y 2 - tell! = E||y T y 2 - Y T Ax\\ 2 2 = 0(log 2 dy/\ogN\ogiX/S)) opt e , 5 (A, ^-). 
Since, by the definition of t, ^r^- < n i we have the desired bound. 

The bound on E||yi — yi||| follows from Lemmapjand Lemma [9] We will use the notations for W4, and defined 
in Algorithm [5J Let L = nXX T K; by Lemma [l^ 

E||yi - yill! < 4E||fc - Vi\\l° = 4Emax|(na,,XX T (y - y))\. 

The last equality follows from the definition of the dual norm || • ||/,o and from the fact that L is a polytope with 
vertices {<Xj}jLi, so any linear functional on L is maximized at one of the vertices. From the fact that for all 
i 7^ j we have U^fUj — 0, from the triangle inequality, and from Lemma |9j we derive 

t 

E max \ (naj, XX T (y — y)) \ = Emaxn|\/fc (a,-, riUiWi) \ 

3=1 i=i , 

i—i 

t 

< yk / ^M max n\ (a j,riUiWi)\ 

ti 3=1 

< 0{y/kty/\ogN\og(\/8)) opt M (A, n) 

= 0(log 3 / 2 d^\ogN\og{l/8)) opt M (A, n). 

This completes the proof. ■ 
3.4 Computational Complexity 

In this subsection we consider the computational complexity of our algorithms. We pay special attention to 
approximating the minimum enclosing ellipsoid of a polytope and computing least squares estimators. For both 
problems we need to go into the properties of known approximation algorithms in order to verify that the 
approximations are sufficient to guarantee that our algorithms can be implemented in polynomial time without 
hurting their near-optimality. 

3.4.1 Minimum Enclosing Ellipsoid 

Computationally the most expensive step of the base decomposition algorithm (Algorithm [lj is computing the 
minimum enclosing ellipsoid E of K . Com puting the exact MEE can be costly: the fastest known algorithms have 
complexity on the order of d°^N AS93 . However, for our purposes it is enough to compute an approximation 
of E in Banach-Mazur distance, i.e. some ellipsoid E' satisfying hE' C E C E' for an absolute constant 
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C. Known approximation algorithms for MME guarantee that their output is an enclosing ellipsoid with volume 
approximately equal to that of the MEE Kha 96|TY07 . It is not immediately clear whether such an approximation 
is also a Banach-Mazur approximation. However, we can use the fact that the algorithms in Kha96 KY05 



TY07 output an ellipsoid E' satisfying approximate complimentary slackness conditions and show thatTEiv 
approximates HE in Banach-Mazur sense for some projection II onto a subspace of dimension {1(d). This suffices 
for a slightly weaker version of Lemma [5j 

We begin with a definition. Let's define a vector p € [0, 1]^ to be C-optimal for A = (ai)f =1 if the following 
conditions are satisfied: 



for all i € [N], af (APA T )~ 1 ai < C ■ d where P — diag(p) (we use this notation throughout this section). 



The C-optimality conditions are a relaxation of the Karush-Kuhn- Tucker conditions of a formulation of the 
MEE problem as convex program. The approximation algorithm for MEE due to Khachiyan |Kha96| , and later 
follow up work [KY05 TY07 compute a C-optimal p and o utput an approximate MEE ellipsoid E(p) — {x : 

is 0(d 2 N), where the O notation suppresses 



TY07 



(APA T ) 1 x < Cd}. The running time of the algorithm in 
dependence on C as well as polylogarithmic terms. 

C-optimality implies the following property of the the ellipsoid E(p) which is key to our analysis. 



Lemma 10 



Let E* = F 
Let also E(p) 



be the minimum enc, 

{x : x T (APA T )- 1 x < Cd} = F{p)B%. Then, 



losing ellipsoid of K — sym{ai}iL 1 , and let p be C-optimal for 



o- 2 d/2 (F(p))<4Ca 2 d/4 (F*)- 



Proof: Let G = (F*)- 1 . Since GE* = B%, we have that the MEE of GK is the unit ball, and, therefore, 
\\GdiWl < 1 for all i e [N]. Since F{p)F{p) T = Cd ■ APA T , we have 

1 - 1 

-]>> 2 (GF(p)) = -tr(GF(p)F(p) T G T ) 

i=l 

= Ctr(GAPA T G T ) 

N 

= C^2 Pi \\Ga i \\ 2 2 <C. 

i=\ 

By Markov's inequality, a 2 ,^ A (GF(p)) < AC. Let LIi be the projection operator onto the subspace spanned by the 

left singular vectors of GF(p) corresponding to a d/i {GF(p)), a d {GF{p)). We have IIiCE(p) C 2C 1 / 2 Il 1 B%. 
Multiplying on both sides by F* , we get 

F*Ii 1 GE(p) C 2C 1/2 F*Tl 1 B%. 

Let II2 be the matrix II2 = G _1 riiG = F*IIiG. Since LT2 is similar to EL, it is also a projection matrix onto a 
3d/4 dimensional subspace. We have that F*Hi = H 2 F* , and therefore 

U 2 E(p) C 2C 1/2 U 2 E*. 

Define H = ACF*(F*) T — F(p)F(p) T . The inclusion above is equivalent to the positive semidefiniteness of the 
matrix D^-ffll^. As II2 is a projection onto a 3d/4 dimensional subspace, by the standard minimax characteriza- 
tion of eigenvalues we have A 3( j/4(i?) > 0. We recall the (dual) Weyl inequalities for symmetric dx d matrices X 
and Y : 

K(X) + X J (Y)<X l+J _ d (X + Y). 

The inequalities are standard and fol low fro m the minimax characterization of eigenvalues and dimension counting 
arguments — see, e.g. Chapter 1 in [Taol2| . Substituting X = H and Y — F(p)F(p) T , i = 3d/4 and j = d/2, we 
have the inequality 

°%2{F(P)) = X d/ 2(F( P )F(p) T ) < X d/4 (ACF*(F*f) - 4Co* d/i (F*), 
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and the proof is complete. ■ 

Finally we give an analogue of Lemma [5] for the variant of the base decomposition algorithm that uses an 
approximate MEE. The proof follows from Lemma [10] and the arguments used to prove Lemma [5] We omit a full 
proof here. 

Lemma 11 Consider a variant of Algorithm^ that, at each step, uses E(p) — {x : x T (APA T )~ 1 x < Cd} — 
F(p)B2, where p is 0(l)-optimal for A, rather than the minimum enclosing ellipsoid E = FB^. Let di be the 
dimension of the span ofUi. For any i there exists a subspace Wi of dimension Q(di) and a set Si C [N] of size 
\Si\ = n(di), such that o-l ia (WiWTA\ Si ) = tt(l)maxf =1 llEfMa- 

One can verify that in all our proofs we can substitute Lemma [XT] for Lemma [5] without changing the asymptotics 
of our l owe r and upper bounds. Therefore, in all our algorithms we can use the variant of Algorithm [I] from 
Lemma 
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without compromising near- opt imality. This variant of Algorithm [l] runs in time 



Notice that the base decomposition can be reused for different databases, as long as the query matrix A stays 
unchanged; once the decomposition is computed the rest of the algorithm is very efficient: it involves some standard 
algebraic computations and sampling from an 0(c?)-dimensional gaussian distribution. Furthermore, any ellipsoid 
E' containing K suffices for privacy, and one may use heuristic approximations to the MEE problem. 

3.4.2 Least Squares Estimator 

Except for base decomposition, the other potentially computationally expensive step in Algorithm [3] is the com- 
putation of a least squares estimator y\. This is a quadratic minimization problem, and can be approximated by 



the simple Frank- Wolfe gradient descent algorithm FW56 . In particular, for a point y' such that \\y' — y\\\ < 
ming ei \\y' — y\\l+a, Lemma[l]holds to within an additive approximation factor a, i.e. \\y'— yW^ < 4||iu||£°+a. We 
call such a point y' an a additive approximation to the least squares estimator problem. By the analysis of Clark- 
son [ClalO] , T iterations of the Frank- Wolfe algorithm give an additive approximation where a < AC (L) / (T + 3) , 
for C(L) < sup u veL \ (u, u — v)\. In our case L — nXX T K. In order to have near optimality for Algorithm [3J an 
additive approximation a < J2l=i nr i suffices. Using the triangle inequality and Cauchy-Schwarz, we can bound 
C(L) for L = nXX T K as 

t t 
C(L)<J2 (u T ,u-v) <^2n\ 2 . 

i=1 u,v£nUiUj K i=1 

Therefore, T — 0(n) iterations of the Frank- Wolfe algorithm suffice. Since each iteration of the algorithm 
involves N dot product computations and solving a homogeneous linear system in at most d variables and at 
most d equations, it follows that an approximate version of Algorithm [3] with unchanged asymptotic optimality 
guarantees can be implemented in time d°^Nn. 

We note that the approximation algorithm of Khachiyan for the MEE problem, as well as its modification 



in TY07 , can also be interpreted as instances of the Frank- Wolfe algorithm (see TY07 for details) 



4 Results for Pure Privacy 

Our geometric approach to approximate privacy allows us to better understand the optimal error required for 
approximate privacy vs. that required for pure privacy. Our first result bounds the gap between the optimal error 
bounds for the two notions of privacy in the dense case. Then we extend these ideas and give a (e, 0)-differentially 
private algorithm which nearly matches the guarantees of Algorithm [3] for sparse databases. 

4.1 The Cost of Pure Privacy 

In this subsection we investigate the worst-case gap between opt e S (A) (for small enough 6 > 0) and opt E (^) 
over all query matrices A. At the core of our analysis is a natural geometric fact: for any symmetric polytope 
K with N vertices in d-dimensional space we can find a subset of d vertices of K whose symmetric convex hull 
has volume radius at most a factor 0(y/\og(N / d)) smaller than the volume radius of K. Our proof of this fact 
goes through analyzing the contact points of K with its minimum enclosing volume ellipsoid, and a bound on the 
volume of polytopes with few vertices. 



Theorem 15 Let K — sym{ai, . . . , apj} Q H and let E be an ellipsoid of minimal volume containing K . Th 
exists a set S C [N] of size d such that the matrix A\s — (aj)iGS satisfies det^js) 1 /' 4 — f2(vrad(£')). 



ere 
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For the proof of 15 we will use John's theorem (Theorem [6]) and the following elementary algebraic result. 
Lemma 12 Let u±, . . . , u m be d- dimensional unit vectors and let c\, . . . , c m be positive reals such that 

2J CtUiuf = I. (17) 
Then there exists a set S C [m] of size d such that the matrix U = {ui)i<zs satisfies | det([/)| 1 / d = fi(l). 



Proof: Notice that tr(uiit/ ) = \\ui\\\ = 1 for all i. By taking traces of both sides of (17), we have J^Cj = d 



Let U = (ui)™ i and let C be the m x m diagonal matrix with (q)™ 1 on the diagonal. Then we can write 
I = J2 c i u i u T = UCU T . By the Binet-Cauchy formula for the determinant, 

1 = det(UCU) = II Ci det ( u \s) 2 

SC[m]:\S\=di£S 

< max det(U\ s ) 2 V TT c, 

~ Sa m ]:\S\=d ^ 11 

~ SC[m]:\S\=di£S 



d d 

< max det(J7|s) 2 -rr 

S<Z[m]:\S\=d d\ 



The inequality ( 18 1 follows since each term Hies °i appears d\ times in the expansion of (53 c i) d an d au other terms 
in the expansion are positive. Using the inequality dl > (d/e) d , we have that maxsc[m]:|S|=d det(U\s) 2 ^ d > 1/e, 
and this completes the proof. ■ 



Proof of Theorem 15 : We can write the minimum enclosing ellipsoid E as vrad(_E),Fi?2 where F is a linear map 
with determinant 1. Since F~ l does not change volumes, B 2 is a minimal volume ellipsoid of vrad(E)^ 1 F^ 1 K . 
Also, for any A\s — (aOieSi where S C [AT], we have 

det(A\ s ) = vrad(£) d det^rad^)" 1 ^ 1 ^^). 

Therefore, it is sufficient to show that for L = sym{tti, . . . , un} such that B 2 is the minimal volume ellipsoid of L, 
there exists a set S C [AT] such that the matrix U\s — satisfies det(C/|s) 1 / d = Since, by convexity, 

the contact points LC\ B 2 of L are a subset of u±, . . . , u^, the statement follows from Theorem [6] and Lemma [12] 



Combined with the following theorem of Barany and Fiiredi BF88 , and also Gluskin Glu07| (with sharper 



bounds), Theorem 15 implies the corollary that for any (i-dimensional symmetric polytope one can find a set of 



d vertices whose symmetric convex hull captures a significant fraction of the volume of the polytope. 
Theorem 16 ( [BF88 , Glu07 ) Let K = sym{ai, . . . , on} and let E be an ellipsoid containing K. Then 



vrad(A') < O vrad(S). 
Corollary 1 For any K = symjai, . . . , ajv} there exists a set S C [AT] such that 



det « a «^ =n Uio« y^K). 



Finally, we describe the application to differential privacy. By Corollary 
Also, by 111]), detLB(A, d) = 0(\og 2 d) opt Ci C2 (A, d). Finally, Lemmaf" 

for 6 small enough with respect to e. Putting all this together and using Theorerr |12[ we have the following theorem 
(Theorem^. 



:y 1 vol 
3 impli 



volLB(A, e) = 0{±s \og(N/d)) detLB(A, d). 
ies that opt Ci C2 (A, d) < e 2 opt e s (A, d/e) 



Theorem 17 For small enough e and all S small enough with respect to s, for any d x N real matrix A we have 
opt s>0 (A) = 0(log 3 d) volLB(A, e) = 0(log 5 d\og{N/d)) opt^(A ~). 



20 



4.2 Sparse Case under Pure Privacy 



We further extend our results from Section |373| and show an efficient (e, 0) -differentially private algorithm which, 
on input any query matrix A and any database size bound n, nearly matches opt e (A,n). This proves our main 
Theorem In fact, our result is stronger: we show an (e, 0)-differentially private mechanism whose error nearly 
matches opt e s (A,n) for all 8 small enough with respect to e. Thus, the result of this subsection can be seen as 



a generalization of Theorem 17 to the sparse databases regime. 

Our algorithm for sparse databases under pure privacy closely follows Algorithm [3j we add noise from a dis- 
tribution that is tailored to A but oblivious to the database x; then we use least squares estimation to reduce 
error on sparse databases. However, Gaussian noise does not preserve (e, 0)-differential privacy, and we need to 
use a different noise distribution. Intuitively, one expects that adding noise sampled from a near-optimal distri- 
bution [HT10[|BDKT12| and then computing a least squares estimator would be nearly optimal, analogously to 
Algorithm |3J We are not currently able to analyze the error of this algorithm, but instead we analyze a variant 
of Algorithm [3] where the Gaussian distribution is simply substituted with the generalized if-norm distribution 
from HT10| . Intuitively, we are able to show that the generalized X-norm distribution "approximates a Gaus- 
sian" well enough for our analysis to go through. A main tool in our analysis is a classical concentration of 
measure inequality from convex geometry. 



We begin with a slight generalization of the main upper bound result of Hardt and Talwar HT10 . This gener- 
alization follows directly from the methods used in HT10 with only minor modifications in the proofs. We omit 



a full derivation here. Also, while the methods of Hardt and Talwar will lead to a proof conditional on the truth 
of the Hyperplane conjecture from convex geometry, using the ideas of Bhaskara et al. 
be made unconditional. 



BDKT12 the result can 



Theorem 18 ( [HT10[[BDKT12] ) Let A = (ai)f =1 be an d x N real matrix and let K = sym{o i }^ :1 . The 



exists an efficiently computable and efficiently sampleable distribution W(A, e) such that the following claims hold: 



1. the algorithm A4k which on input x outputs Ax + w for w ~ W(A, e) satisfies (e, 0)- differential privacy; 

2. W(j4, e) is identical to the distribution of the random variable m X^e =1 wg where m = 0(log d), and wg is a 
sample from a log concave distribution with support lying in a subspace Vg ofH d of dimension dg; 

3. Vg and Vg> for £' ^ I are orthogonal, and the union of {Vg}(L 1 spans H d ; 

4- let Mg = Mg(A,e) — Em 2 wgwJ be the correlation matrix of raw n and let Hg be the projection matrix onto 
span{Vj}^; then 

A max (M£) < 0(log 2 d)^vrad(n^) 2 , 
where X m&x (Mg) is the largest eigenvalue of Mg. 

Using the distribution W(A,s), we define our near optimal sparse-case algorithm satisfying pure differential 
privacy as Algorithm [4] 

Algorithm 4 Least Squares Mechanism: Pure Privacy 

Input (Public): query matrix A = (ai)fL 1 € R dxAr (rankA = d); database size bound n 
Input (Private): database x € R w 

Let Ui, . . . , Uk be base decomposition computed by Algorithm [T] on input A, where Ui is an orthonormal basis 

for a space of dimension di ; 

Let t be the largest integer such that d t > en; 

for all i < t do 

Compute yi = Ui(Uj Ax + w l ) where Wi ~ W(E/f A, ^-). 

end for 

Let y' = Y%=i Vi 

Let X = £* =1 Ui and Y = U; 
Compute y" = Y(Y T Ax + w") where w" - W(Y T A, ^-); 
Compute y' = argmin{||y' — y'\\ 2 : y\ € nXX T K}, where K — AB\. 
Output y' + y". 
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Theorem 19 Let Ai p (A, x, n) be the output of Algorithm^ on input a d x N query matrix A : database size 
bound n and private input x. M P (A, x, n) is (e, 0)- differentially private and for all small enough e and all 6 small 
enough with respect to e satisfies 

err Mp (A) = 0(log 4 dlog 3/2 N) opt e>s (A, n) + 0(log 5 d\ogN) apt e , (4 »)? 
cir Mp (A) = 0(log 4 d log 3/2 N + log 7 d log N) opt^ (A, n). 

Once again privacy follows by a straightforward argument from the privacy of the underlying noise-adding mech- 
anism, in this case the generalized K-novm. mechanism. 

Lemma 13 M p (A, x,n) satisfies (e,0)- differential privacy. 

Proof: Ai p (A, x, n) is a deterministic function of yi> • • • > Vt an d y" ■ Each of these quantities is the output of an 
algorithm satisfying (j^j, 0)-differential privacy (by Theorem 18 claim 1). Therefore, by Lemma[2j Ai p (A,x,n) 
satisfies (e, 0)-differential privacy. ■ 

Next we prove the main technical lemma we need in order to show near optimality. The analysis is very similar 
to that of Lemma [9| The main technical challenge is to show that the distribution W has all the properties we 
needed from the Gaussian distribution: covariance with bounded operator norm and exponential concentration. 
We use ideas from Section |4.1| and the following variant of a classical concentration of measure inequality, due to 
Borell Bor75| (proved in the appendix). 

Theorem 20 Let fj, be a log-concave distribution over R d . Assume that A is a symmetric convex subset of R d 
such that n(A) = 8 > |. Then, for every t > 1 we have 

fi[{tA) c ] < 2-(* +1 )/ 2 
We are now ready to prove the counterpart of Lemma [9] for A4 p . 

Lemma 14 Let Ui, t, and Wi be as defined in Algorithm^ . For all small enough e and all 5 small enough with 
respect to e, Emax^ n\(aj,J2l=i Ui w i)\ — 0(log 4 dlog 3 ^ N) opt e S (A,n). 

Proof: As in Algorithm [3J we define r, = maxj^ 1 1 [/ a j 1 1 2 - Equivalently is the radius of the smallest d{- 
dimensional ball which contains UjK. In the proof of Lemma [9] we argued that opt e s (A,n) = f2(^r 2 ) for all 
small enough s and all 5 small enough with respect to e. 

By Theorem 18 we can write Wi as m, w n where wa is a sample from a log concave distribution over a 

subspace Vu and m, = 0(logdi). Furthermore, all wu for a fixed i are mutually orthogonal. Finally, letting Ha 
be the projection matrix onto spantyu}™^ and Mu be the covariance matrix of miWa, we have 

A max (M«) < 0(log 2 d)^- vrad(n«[/ ( T ^) 2 . (19) 

Therefore, for any a,-, we can derive the following bound: 

En 2 \(a 3 , U iWi )\ 2 = n 2 E\ {Uf a 3 , Wl )\ 2 

Ttli 

< n 2 m i y^^E\(U^a J ,m t w ie )\ 2 
= n 2 m, 



Y^iUfa.fMuiUfa,) 

rrii 

< n 2 m t Y, Wi ^|||A max (M«) 

1=1 

< 0(log 5 d)n 2 -^r 2 d a vTad(U ie U^K) 2 



t=l 



The first bound above follows from the Cauchy-Schwarz inequality and the last bound follows from (19 1. To 
bound da viad(HuU^ K) 2 , recall that Uf K is contained in a ball of radius r^, and therefore so is WnUj K for all 
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I. Therefore, by Theorem 16 vrad(n^C/ i T K) 2 — 0((\og(N/dig)) /du)r 2 . Substituting into the bound above and 
recalling that opt e S (A, n) = il(jr 2 ), we get 

En 2 \{ aj ,U lWl )\ 2 = 0(log 6 dlog(iV/d))opt £iA -(A,^) 2 - 
Thus applying Cauchy Schwarz once again, we conclude 

t t 

En 2 1 ( aj , U i w l )\ 2 <t-^2 E " 2 1 ( a i > U - w ^\ 2 = °( lo S 8 d logW<*) opt £:5 (A, n)). 

For any j, the set {w : \(U^faj, J^- Wi)\ < T} is symmetric and convex for any bound T. Then, by Chebyshev's 
inequality, and Theorem [20] there exists a constant C such that for any i, j and a > 2 



Pi[n\(Ulaj, Wl ) \ > Ca log N log 4 dyJ\og(N/d) -rf] < A^ Q . 

Using a union bound and taking expectations completes the proof. H 
Proof of Theorem |19| The privacy guarantee follows from Lemmas [13} By Lemma [14] analogously to the 



proof of Theorem 14 we can conclude that 

t 

E[||j/' - XX^xH 2 .] < 4Emaxn|(a,, Vc/.w,)! < 0(log 4 dlog 3/2 AT) opt e 5 (A,n). 

2 = 1 

Moreover, by Theorem [12] 

E[||y" - Fy T Az;|| 2 ] < 0(log 3 d) volLB(yy T A, — -- ) 

i ~r~ 1 

= o(t 2 iog 3 d)opt £i0 (rr T Af ) 

= 0(log 5 d) opt ej0 (A,n). 

The last bound follows since Fy T is a projection matrix, d t < ne and i = O(logd). Also, by Theorem 
vo\LB(YY T A, j^-) = 0(i 2 log 2 dlog(A7d t ))opt £ ^(yy T A, ^), and therefore we have E[||y" - yy T AE||2j~ 

0(log 7 dlog(A7d))opt M (An). 

Pythagoras theorem then implies the result. ■ 

5 Universal bounds 

For d linear sensitivity 1 queries, there are known universal bounds on err £ ^(A,n) and err £i o(^> n). We note 
that the sensitivity 1 bound implies that the li norm of each column is at most \fd. This in turn allows us 
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to prove an upper bound on the spectral lower bound, so that the relative guarantee provided by Theorems 13 



and [14] can be translated to an absolute one. The resulting bounds can be improved by polylogarithmic factors, 
by using natural simplifications of the relative-error mechanisms and analyzing them directly. We next present 
these simplifications. The average per query error bounds resulting from our mechanisms match the best known 
bounds for (e, <5)-differential privacy, and improve on the best known bounds for pure differential privacy. 

For (e, (^-differential privacy , the best k nown un iversal upper bound when A £ [0, l] rfxJV for the total ^f, error is 



0(nd log dVlog N log(l/£) /e) , given by GRU12 . We note that when A e [0, l] dxAr , one can use B(0, y/d) as an 



enclosing ellipsoid for K. The following simple mechanism is easily seen to be (e, <5)-DP. 
Theorem 21 The mechanism M s of Algorithm^is (e,S) -differentially private and satisfies 

err^A) < 0{nd\og{\/8) yjlog N/e) 

Proof: The privacy of the mechanism is immediate from Lemma [4] To analyze the error, we use Lemma [l] and 
the fact that L = nK to bound 

eri- Ms (A) = E[\\y - Ax\\ 2 } < 4n\\w\\K° — 4nE max | (a,- , w) | , 
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Algorithm 5 Simple Noise + Least Squares Mechanism 

Input Public Input: query matrix A = (a;)^ € [0, l] dxN 
Input Private Input: database x <E 

Let cM) = i±VpM. 
Let r = max^ =1 1 1 <x j- 1 1 2 ? 
Sample w ~ N(0, c(e, 5)) d ; 
Let y — Ax + rw. 

Let y — argmin{||y — y\\^ : y G nK}, where K = AB\. 
Output y 



where {aj}^L 1 are columns of A. We have used the fact that the ||u>|jif° = max ae ^(a, w) is attained at one of 
the vertices of K. Since each (a,j,w) is a Gaussian with variance r 4 c(e, S) 2 , \(aj,w)\ exceeds r 2 c(e, S)y/i log N 
with probability at most j^. Taking a union bound, we conclude that this expectation of the maximum is 
0(r 2 c(e, S)y/logN). Recall that r = max^j \\ajW2 < yd. It follows that 



eiT Ms (A) < 0(ndc(e,5)^\ogN) 



Comparing with GRU12 , our bound is better by an O(logd) factor. However, the previous bound is stronger in 



that it guarantees expected squared error O e ^$ (n\/log ./V log d) for every query, while we can only bound the total 
£ 2 error. 



For getting pure e-DP, we simply substitute the generalized i^T-norm distribution guaranteed by Theorem 18 
instead of the Gaussian noise. 

Algorithm 6 if-NORM Noise + Least Squares Mechanism 

Input Public Input: query matrix A = {ai)f =1 G [0, l] dxN 
Input Private Input: database x G 

Let y = Ax + w where w ~ W(A, e). 

Let y = argmin{||y — y|| 2 : y G nK}, where K = AB\. 

Output y 

We first observe an upper bound on the volume radius of projections of K . 

Lemma 15 Let A G [0, l] dxN and let K = sym{a 1 , . . . , a N }. Let be a rank k orthogonal projection that 
maps R d to R k . Then 



vrad(I#)K) < O f /° g( f fc) ) Vd 



Proof: Since each column of A has li norm at most y/d, H^ k 'K is contained in a ball of radius yd. Theorem 16 
then immediately implies the claimed bound. I 

Now we show that Algorithm [6] achieves the bound claimed in Theorem [5] 

Theorem 22 The mechanism Ai sp of Algorithm^is (e,0)- differentially private and satisfies 

err Ms {A) < Oinde' 1 log 2 d log i N) 



Proof: The privacy property follows from Theorem 18 and the fact that post-processing preserves (e, 0) 



differential privacy. To prove the error bound, we need to upper bound E tu [||w||^o] for a polytope K C W 
with N vertices, where w is drawn from W(A, e). It therefore suffices to bound E^fmax^ \(ai, w)\]. 



By Theorem 18 we can write w as YleLi mw i where wg is drawn from a log concave distribution and m — 0(log d). 
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For a fixed £, 

E[|( ai ,m^)| 2 ] =afM t Oi 

< \\ai\\ 2 ■ \ max (M e ) 



< 0{d\og 2 d)^^vrad(n^) 2 . 



By lemma 15 this is at most O (dlog 2 dlog(N/d e )d/e 2 ) < 0(~z log 2 dlogN) . Then by Cauchy-Schwarz, 

m tci 

E [|( a »> ^ rawi)| 2 ] < m-y^E[| (ai,mwi) 



2l 



rf 2 

< 0(^log 4 dlogjV) 

By Theorem[20j Pr[|(ai,X^it;*)| > tlogiV- 0(f log 2 d\/Iog]V) < jV- n M. It follows that E„,[||w|| K °] is bounded 
by 0(f log 2 d log 5 TV). 

It then follows that E[\\y - Ax\\% is Oinde^ 1 log 2 dlogi N). U 

6 Extensions 

In this section we describe a couple of extensions to our results. We show how to translate our optimality 
guarantees for total squared error in the dense case regime to worst case error over queries using the minimax 
theorem. We further show that our nearly optimal efficient mechanism in the dense case regime implies a 
polylogarithmic approximation to hereditary discrepancy. 

6.1 Expected Error 

For % G [d], let aW denote the ith row of A and let M(x)i denote the «th coordinate of the answer A4(x). Let 
err^ (A) = max l£ll N E[|| Ax — .M(ie)||oo] denote the worst case (over databases) expected error of a mechanism 
for a query A. Here as usual, the expectation is taken over the internal coin tosses of A4. Thus we measure the 
expected worst case error E[maxi \{a®,x) - M{x)i\\. Let opt^(A) = min^.^vi is (e,6)-DP err^J" (A) denote the 
minimum error over all (e, <5)-differentially private mechanisms. In the dense case, our results can be extended 
to the £oo error at the cost of an additional 0(logd) loss in the competitive ratio. 

We derive such an extension in two steps. First, we give a mechanism for which the worst case expected squared 
error max, E[\{a®,x) -Mix),] 2 } is small. We then use this mechanism as a blackbox to derive one which has 
small expected error. For a mechanism Ai, let Ef^' A = W\\{cS- l \x) — Jvi(x)i\ 2 } denote the expected squared 
error of the mechanism in the ith coordinate. 

Theorem 23 Let A e R dxN , s, 8 be given. There is an (e, 8) -DP mechanism A4 we , and a non-negative diagonal 
matrix P with tr(P 2 ) = 1 such that for all i G [d], 

E M WB ,A < (i g2 dlogl / (5 ) specL B(PA) < 0(log 2 dlogl/(5)(opt^(A)) 2 



Proof: Recall that for any A, the mechanism M. A of Section 3.2 is (e,8)-DP and has total expected £\ error 
at most (3 = O (log 2 log 1/8) times the lower bound specLB(A). We will use this mechanism as a subroutine to 
derive M. we . 

Let (pi,p2, ■ ■ ■ ,pd) denote a probability distribution so that pi > and ^2 t Pi — 1. Let P denote a diagonal matrix 
with entries ^Jpl. It is easy to see that err^, (PA) — YlnViE^^ . It follows that optf 2 5 {PA) < (opt £ °° (A)) 2 : indeed 
the mechanism M. achieving opt^°° (A) gives this error bound. 

Thus the mechanism M.^ A satisfies 



a 



-,M FA .A 



en * PA (PA) < /3specLB(PA) < /3opt^(PA). 



It follows that YliPi^H 9 i s P ' (opt e °° (A)) 2 . Consider a two-player zero sum game where the row-player 
selects a query from A, and the column player picks an (e, <5)-DP mechanism M and must pay the row player 
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E i ' . The above discussion shows that for any randomized strategy (given by P) of the row player, the column 
player has a strategy that guarantees payoff at most j3 ■ (opt £ °° (A)) 2 . By the minimax theorem, this also upper 
bounds the value of the game. Moreover, using standard constructive versions of the minimax theorems (see 
e.g. AHK12| ), one can come up with such a distribution V over 0{d\ogd) (e,5)-DP mechanisms, and a P such 



that for all i€ [d], 

E M ^ V [E^' A ] < 2/?specLB(PA) < 2/J(opt*°° (A)) 2 . 

Recall that the mechanism Mg A is of the form Ax + w where w is a noise vector whose distribution is independent 
of x. Thus the pair (T>, P) can be computed without looking at the database x. Therefore, the mechanism M we 
that samples a M from T> and runs it on x is itself (e, <5)-DP. The theorem follows. ■ 

Using the above result, we can now construct a mechanism that has small expected worse case error. 

Theorem 24 Let A G R dxN , e, S be given. There is an (e, S)-DP mechanism M ew , ond a non-negative diagonal 
matrix P with tr(P 2 ) = 1 such that 

(err^f (A)) 2 < 0(log 4 d log ((log d)/5)) • specLB(PA) < 0(log 4 dlog((logd)/<5)) • (opt^(A)) 2 



Proof: Our mechanism with small expected noise simply runs L = O(logd) copies M we (x) and returns the 
median answer for each coordinate. In other words, if y J denotes the outcome of the jth run of the mechanism 
M we , then we set z% = median(y l 1 , . . . , yf ). 

By Markov's inequality, we know that for each i,j, Pr[(y{ — (Ax)i) 2 > kEf 4 ""^] < i. Applying Chernoff bounds, 
we conclude that Pr[(zj — (Ax)i) 2 > kEf 4 ""^] < (2/k) CL for a constant C. By a union bound, we conclude that 
Pr[||z - Ax^ > 0(logd)opt^(A)] <d-(f) ci . Taking L = fi(logd) suffices to ensure that the integral of this 
probability over k G [4, 00) converges, giving a bound on the expected norm of the error. 



The resulting mechanism is not necessarily (e, 5)-difTerentially private any more. However, since its outcome 
can be computed by postprocessing the outcome of L (e, ^-differentially private mechanisms, it is still (Le, L8)- 
differentially private. Scaling e and 6 by a factor of L, and substituting for /?, we get the result. ■ 

6.2 Approximating Hereditary Discrepancy 

In this section we show that our optimal dense case mechanism implies a polylogarithmic approximation to hered- 
itary discrepancy. In particular, the mechanism optimal for l\ error can be used to approximate £2 discrepancy, 
and the mechanism optimal for £ oa error can be used to approximate £00 discrepancy. The £ x version of heredi- 
tary discrepancy is NP-hard to approximate to within a factor of 3/2 (proof in appendix). Showing supeconstant 
hardness for approximating any version of hereditary discrepancy, constant hardness for approximating £2 hered- 
itary discrepancy, and determining whether hereditary discrepancy can be exactly computed in nondeterministic 
polynomial time remain open problems. 



Muthukrishnan and Nikolov MN12 show that hereditary discrepancy gives a lower bound on the error of any 
mechanism. 

Theorem 25 ( |MN12|, Corollary 1) There exist constants e , S such that the following holds: Let A beadxN 
matrix and M be an (e, S) -differentially private mechanism. Then 

herdisc £ °° (A) < 0(\ogN) ■ err^(A) 



Fix constants e and 5 satisfying Theorem 25 Thus the theorem implies that herdisc 00 (A) < 0(log N) ■ opt 00 (^4). 
It is easy to see that for any positive semidefmite diagonal matrix P with tr(P 2 ) = 1, (herdisc £oc {A)) 2 > 
herdisc^ 2 (PA): indeed for any S C [TV], there exists a coloring z supported on S such that || Az||oo < herdisc^ 00 (A). 
Since ||PAz|| 2 < HAsH^,, the claim follows. Moreover, recall that herdisc^ 2 (PA) > c ■ specLB(PA), for some 
absolute constant c. Thus 



specLB(PA) < (herdisc £ ~(A)) 



2 
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On the other hand, the mechanism of Theorem Y2A\ satisfies 



(err^JA)) 2 < 0(log 4 dlog((logd)/<5)) specLB(PA). 



Thus we have 



(herdisc^(A)) 2 < 0(log 2 N)enfa m {A) 

< 0(log 4 dlog 2 7Vlog((logd)/(5)) -specLB(PA) 

We have sandwiched herdisc £ °° (A) 2 between two efficiently computable quantities that are 0(log 4 d log 2 N log log d) 
apart. It follows that 

Theorem 26 There is a polynomial time algorithm that, given a dxN real matrix A, outputs an 0(log 2 d log N \f\og log d) 
approximation to herdisc^ 00 (A) . 

The description of this mechanism M ew (which is specified by a distribution over (9(log d) Gaussian noise addition 
mechanisms) thus serves as an efficiently computable and verifiable witness to an upper bound on the hereditary 
discrepancy of A. 



We next give a constrcutive version of this result, by appealing to a result of Bansal BanlO that shows that 
the discrepancy of any set system can be constructively upper bounded by O(logdiV) times the hereditary vector 
discrepancy. The vector discrepancy of a matrix A, denoted vecdisc(A) is defined as the smallest A such that the 
following semidefinite program is feasible: 

SDP VecDisc: A G R dxm 

a (i) Va^ T < A Vt G [d] 

v js < i v? g M 

m 

5^>m/8 (20) 

j'=i 

V h 

The hereditary vector discrepancy is simply hervecdisc(^4) = maxsc[jv] vecdisc(A|s). We will show that given the 
mechanism M. ew of Theorem |24j we can construct for any S a solution to the SDP corresponding to S. We note 
that the above SDP is a slight relaxation of the SDP used by Bansal: instead of the constraint Vjj = 1 for all j, 



we simply require each Vjj to be at most one, and that the trace of V is O(m). It is easy to verify that BanlO 
implies that: 

Theorem 27 ( |BanlO|) Suppose that for a matrix A G ]R dxA ' and a A > 0, for every S C [N] with rank(A\s) = 



\S\, it is the case that the SDP 20 is feasible for A\s ■ Them there is polynomial time algorithm that finds a coloring 



X of [N] with discrepancy at most O(XlogdN). 

We consider a variant of the above SDP, where we drop the Vjj < 1 constraint and require the trace of V to be 
slightly larger: 

SDP RelVecDisc: A G R dxm 

o Wy o (0T < A Vi G [d] 

m 

Y,Vn>m (21) 

3=1 

v t o 

We first show that it suffices to satisfy the SDP [21] 

Lemma 16 Suppose that for a matrix A G R dxN , for every S C [TV] with rank(A\s) — \S\, it is the case that 
the SDP 21 is feasible for A\g and A. Then for any S C [N] with rank(A\$) = \S\, the SDP 20 is feasible for 
A\ s and 2A. 
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Proof: We construct a feasible solution to 20 by repeatedly using solutions to 21 for different values of the 
restriction A\s- Fix an S C [N] with 151 = m and let So — S. Let W° be the d x m zero matrix indexed by 
S C [AT]. Let V"° be a solution to SDPpijfor A\ So . Let 7o = max, V$, and let ^° = V A °/2 7o . For each j, / e S , 



set W*, = W? f + V 3 ° f . Finally we set S 1 = S Q \ {j : Wj } > \} 

Given Si such that \Si\ > m/2, we let be a feasible solution to the SDP for A\s ( and set % = max.,- VL, and 
V i = V l /2y t . For each j,j' e 5,, set W^t 1 = W*., + V£,. We update S l+1 = Si \ {j : W^ 1 > \}. We stop once 
\Si\ falls below m/2, say in iteration L. It is easy to see that we delete at least one j from S, in each step, so 
that this process converges. 



We claim that W is a feasible solution to SDP 20 Observe that W is simply a a non-negative linear combination 



of V l, s and hence is positive semidefinite. By definition of V, each diagonal entry is at most so that the 
definition of Si ensures that Wjj < | < 1 for all j. Moreover, for each j e S \ Sl, the entry Wjj > j, and there 
are at least m/2 such j, so that the trace of W L is at least m/8 as required. Finally, a^W L a^ T is simply the 
sum Efjo 1 g(<: T° ( * )T - Thus it suffices to show that E^C 2 !*) -1 ^ 2 - Note that tr(PF* +1 - = tr(V"*) > 
(2 7t )- 1 |S' t | > (2 7t )~ 1 TO/2. It follows that m > tr(W L ) > Y,t=i \ 2 1t)~ 1 m/2 so that Y^ii^ltY 1 < 2 as needed. 
The claim follows. ■ 

Finally, we describe how to use the mechanism Ai ew of Theorem [24] to construct a feasible solution to SDP [2l| 
Lemma 17 Suppose that for a matrix A £ R rfxJV , we have an oblivi ous noise (e,6)-DP mechanism M. such that 



eiTj^(A) < n. Then for any S C [N] with rank(A\ s ) = \S\, the SDP\2l\ is feasible for A\ s with A = O s j(k 2 ) 



Proof: Since an (e, (5)-DP mechanism for A also yields an (e, S)-DP mechanism for any submatrix of A, we can 
assume without loss of generality that S = [N]. For a y G JR d , let invA(y) = argmina; \\Ax — y||oo- Let Y be a 
random variable distributed according to A4(0) and let X = inv A(Y). Then by the properties of the mechanism 
EfHAXHoo] < E[\\AX - YWoo + ||F||oo] < 2E[||y|| 00 ] < 2k, since is a possible value for mvA(Y), and we picked 
X. Thus if we set V = E[XX T ], then a^Va^ T < 16k 2 for each i. 

It remains to lower bound V j3 = E[(X i ) 2 ] > Var(X J ). Intuitively, if Var(Xj) was small, then Y gives too much 
information about Xj, violating differential privacy. Formally, by differential privacy, the statistical distance 
between A4(0) and M.(e.j) is at most 2(e + $). Thus the distribution Y is close to Y + Aej. But it is easy to 
checlj^jthat inv A(y + Aej) = inv A{y) + ej. Thus the distribution X is close to X + ej. This in turn lower bounds 
the variance of Xj by an absolute constant. Scaling V by a constant factor implies the result. Finally, we observe 
that this proof can be easily made constructive. We can take polynomially many samples x±,. . . ,Xk and setting 
V to the empirical estimate jr Ei=i x i x J instead of E[XX T ]. Truncating the distribution so as to reject any a;, 
with HAxilJoo > d 3 K or with ||afi||oo > ^ 3 still preserves the required properties and allows us to use Chernoff 
bounds to ensure that the empirical estimate satisfies the constraints up to a constant factor. ■ 



7 Conclusions 

We have presented near optimal mechanisms for any linear query for dense and sparse databases, under both 
pure and approximate differential privacy. Our mechanisms are simple and efficient, and it would be instructive 
to implement them so as to compare them with existing techniques. 

Our work uses the hereditary discrepancy lower bound, which holds for small enough constant e and 5. Since 
our lower bounds do not get higher as S gets smaller, the approximation ratio has an 0{\J\og 1/6) term in it. 
We leave open the question of developing better lower bounding techniques, and better approximation ratios for 
(e, (5)-differentially private mechanisms. Our work gives £| bounds on the error. While we can translate those 
bounds to l^, error bounds for the dense case, we leave open the question of designing near optimal error 
mechanisms in the sparse case. 
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A Concentration of Log concave measures 

The following measure concentration inequality is standard. We include a proof below for completeness. We start 
by stating the Brunn-Minkowski inequality. 

Theorem 28 Let /.i be a log-concave measure on !R , let a, f3 > be numbers such that a + f3 — 1, and let 
A, B C 5R d be measurable sets such that the set olA + f3B is measurable. Then 

^aA + f3B)>( t ,(A)n f x(B)f 
This can be used to prove Borell's lemma for arbitrary log concave distributions. We use the proof approach 



presented in Gia03 



Proof of Theorem I20t Observe that 



n d \AD (7^7)0^ \ tA) + - — - I. 
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t+T 



Applying the Brunn-Minkowski inequality and rearranging proves the result. ■ 

B From Concentration to Expectation 

We repeatedly use the fact that a exponential or better bound on the upper tail implies a bound on the expectation. 
For completeness, we give a quantitative version with a proof. 

Lemma 18 Suppose that for some constants C,C\,Ci, a random variable X satisfies: Va > Ci,Pr[X > aC] < 
exp(-a/c 2 ). Then E[X] < (ci + c 2 )C. 



Proof: Let Y = (X - ciC)/C. Then 



f'OC 

E[Y] = / Pr[Y > a] da 
Jo 

POO 

< / exp(— a/c-i) da 
^0 



C-2 



The claim follows by linearity of expectation. 
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C Hardness of Approximating Hereditary Discrepancy 



We show constant hardness of approximation herdisc 00 . Strong inapproximability results were previously given 
for discrepancy, in the 1% and versions, by Charikar, Newman, and Nikolov CNN11 . 

Let us define 



disc 00 (A) 



mm 

xG{0,l} r 



\Ax\\ 



Theorem 29 There exists a family of input matrices A € {0, l} mx " for which it is HP-hard to distinguish between 
the two cases (1) herdisc^ 00 (A) < 2 and (2) disc'' 00 (A) > 3. 

The proof of Theorem [29] is a straight-forward reduction from the 2-colorability problem for 3-uniform hyper- 
graphs. A maximization version of this problem (i.e. maximize the number of bichromatic edges) is also known 
as Max-E3-Set Splitting and is equivalent to NotAIIEqual-SAT restricted to inputs with no negated variables. 

Definition 2 A hypergraph H = (V,E), where E C 2 V , is 2-colorable if and only if there exists a set T C V 
such that for all e € E, T (1 e 7^ e and T (1 e 7^ 0. The set T is called a transversal of H . 

Lemma 19 ( |Sch78|) There exists a family of 3-uniform hypergraphs such that deciding whether a hypergraph 
in the family is 2-colorable is HP -complete. 

Proof of Theorem |29| The reduction simply maps a 3-uniform hypergraph to its incidence matrix. I.e. for 
a hypergraph H — (V, E), where V = {v\, . . . , v n } and E = {ei, . . . , e m }, we create a m x n matrix A, where 
Aij = 1 if Vj € et and Aij = otherwise. Observe that if H is 2-colorable, and this is witnessed by a transversal 
T, then ||(vl|s , )a;|| 0o < 2 for all S C [n] and for x defined by X{ = +1 <^ Vi € T. On the other hand, if H is not 
2-colorable, for any x € { + 1, —1}™ we have HArHoo > 3, since otherwise the set T = {vi : Xi = +1} would be a 
transversal. ■ 



We note that Guruswami proved constant hardness of approximation for Max-E3-Set Splitting GurOO . In partic- 
ular, he showed that it is NP-hard to distinguish 2-colorable 3-uniform hypergraphs from hypergraphs for which 
any coloring with 2 colors leaves at least a 1/20 fraction of the edges monochromatic. 
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